From wendland at scan-plus.de Thu Jul 1 01:00:48 2004 From: wendland at scan-plus.de (Joerg Wendland) Date: Thu, 1 Jul 2004 03:00:48 +0200 Subject: [Linux-cluster] Kernel panic in fs/gfs_locking/lock_dlm/lock.c Message-ID: <20040701010048.GC25028@dozer> Hi, I am running the cluster package rev. 0406282100 using a custom 2.6.7 kernel (vanilla) on three VMware ESX virtual machines sharing one SCSI disc with GFS on a 10GB logical volume. Setup as proposed in doc/usage.txt, mkfs, mount and first tests all went fine until the first machine crashed after 20 minutes with the following assertion (messages wrapped): lock_dlm: Assertion failed on line 363 of file fs/gfs_locking/lock_dlm/lock.c lock_dlm: assertion: "!error" lock_dlm: time = 2482179 testfs: num=2,178 err=-22 cur=-1 req=5 lkf=0 Kernel panic: lock_dlm: Record message above and reboot. The other two machines were still running although any process accessing the GFS mountpoint would block infinitely with the effect that the whole cluster is torn down. Kind regards, Joerg -- | Entwickler Elektronische Datenverarbeitung und Dienstbetriebsmittel | | Scan-Plus GmbH Dienstbetriebsmittelherstellung fon +49-731-92013-0 | | Koenigstrasse 78, 89077 Ulm, Germany fax +49-731-92013-290 | | Geschaeftsfuehrer: Juergen Hoermann HRB 3220 Amtsgericht Ulm | | PGP-key: 51CF8417 (FP: 79C0 7671 AFC7 315E 657A F318 57A3 7FBD 51CF 8417) | -------------------------------------------------------------------------------- Diese E-Mail koennte vertrauliche und/oder rechtlich geschuetzte Informationen enthalten. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuem- lich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. -------------------------------------------------------------------------------- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, dis- closure or distribution of the material in this e-mail is strictly forbidden. -------------------------------------------------------------------------------- From teigland at redhat.com Thu Jul 1 03:41:55 2004 From: teigland at redhat.com (David Teigland) Date: Thu, 1 Jul 2004 11:41:55 +0800 Subject: [Linux-cluster] GFS with ATA/IDE drives In-Reply-To: References: Message-ID: <20040701034155.GB11996@redhat.com> On Wed, Jun 30, 2004 at 04:07:57PM -0400, Mark Neal wrote: > is there a place to get more in-depth documentation on GFS than just the > usage.txt file? The new cluster infrastructure Ken alluded to (and some information on how GFS uses it) is documented in the "Symmetric Cluster Architecture" paper which is a work in progress. http://people.redhat.com/~teigland/sca.pdf -- Dave Teigland From teigland at redhat.com Thu Jul 1 03:48:59 2004 From: teigland at redhat.com (David Teigland) Date: Thu, 1 Jul 2004 11:48:59 +0800 Subject: [Linux-cluster] Kernel panic in fs/gfs_locking/lock_dlm/lock.c In-Reply-To: <20040701010048.GC25028@dozer> References: <20040701010048.GC25028@dozer> Message-ID: <20040701034859.GC11996@redhat.com> On Thu, Jul 01, 2004 at 03:00:48AM +0200, Joerg Wendland wrote: > Hi, > > I am running the cluster package rev. 0406282100 using a custom 2.6.7 > kernel (vanilla) on three VMware ESX virtual machines sharing one SCSI > disc with GFS on a 10GB logical volume. Setup as proposed in doc/usage.txt, > mkfs, mount and first tests all went fine until the first machine crashed > after 20 minutes with the following assertion (messages wrapped): > > lock_dlm: Assertion failed on line 363 of file > fs/gfs_locking/lock_dlm/lock.c > lock_dlm: assertion: "!error" > lock_dlm: time = 2482179 > testfs: num=2,178 err=-22 cur=-1 req=5 lkf=0 > > Kernel panic: lock_dlm: Record message above and reboot. This is a bug we know of and are working on right now. > The other two machines were still running although any process accessing > the GFS mountpoint would block infinitely with the effect that the whole > cluster is torn down. You're using manual fencing so I suspect the remaining nodes are waiting for you to verify the node is dead and then run "fence_ack_manual" on the node that's running fence_manual (look in /var/log/messages for the relevant message on that machine.) -- Dave Teigland From dice at mfa.kfki.hu Thu Jul 1 04:28:16 2004 From: dice at mfa.kfki.hu (Gergely Tamas) Date: Thu, 1 Jul 2004 06:28:16 +0200 Subject: [Linux-cluster] GFS with ATA/IDE drives In-Reply-To: <20040630200604.GA26510@potassium.msp.redhat.com> References: <4diduga40c0c63s.300620041051@mail.nextresponse.com> <20040630200604.GA26510@potassium.msp.redhat.com> Message-ID: <20040701042816.GA361@mfa.kfki.hu> Hi! > The most simple setup is to have one machine with a big pile of IDE disks > export thost disks to an IP network with GNBD (or iSCSI). Does anyone know a reliable iSCSI (server side) software implementation? Thanks in advance, Gergely From tom at regio.net Fri Jul 2 08:30:49 2004 From: tom at regio.net (tom at regio.net) Date: Fri, 2 Jul 2004 10:30:49 +0200 Subject: [Linux-cluster] Problems with gnbd Message-ID: Hi all, i have a little problem with gnbd_import : if i start gnbd_import the following error appears : gnbd_import gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such file or directory Anyone have a idea? -tom From Gareth at Linux.co.uk Fri Jul 2 08:41:46 2004 From: Gareth at Linux.co.uk (Gareth Bult) Date: Fri, 02 Jul 2004 09:41:46 +0100 Subject: [Linux-cluster] Compiling GFS .. Message-ID: <1088757706.721.198.camel@squizzey> Hi, I'm trying to compile the current /cluster cvs against 2.6.7 and get the following error .. anyone any idea what I'm doing wrong ? (cd /custer && ./configure && make) tia Gareth. make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src' rm -f cluster service.h cnxman.h cnxman-socket.h ln -s . cluster ln -s //usr/include/cluster/service.h . ln -s //usr/include/cluster/cnxman.h . ln -s //usr/include/cluster/cnxman-socket.h . make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules USING_KBUILD=yes make[3]: Entering directory `/usr/src/linux-2.6.7' CC [M] /root/cvs/cluster/dlm-kernel/src/ast.o In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20: /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service. h: No such file or directory make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1 make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2 make[3]: Leaving directory `/usr/src/linux-2.6.7' make[2]: *** [all] Error 2 make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel' make: *** [all] Error 2 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From erik at debian.franken.de Fri Jul 2 09:10:35 2004 From: erik at debian.franken.de (Erik Tews) Date: Fri, 02 Jul 2004 11:10:35 +0200 Subject: [Linux-cluster] Problems with gnbd In-Reply-To: References: Message-ID: <1088759434.6929.4.camel@localhost> Am Fr, den 02.07.2004 schrieb tom at regio.net um 10:30: > gnbd_import > gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such > file or directory First idea, do you got sysfs mounted? From tom at regio.net Fri Jul 2 09:35:24 2004 From: tom at regio.net (tom at regio.net) Date: Fri, 2 Jul 2004 11:35:24 +0200 Subject: [Linux-cluster] Problems with gnbd In-Reply-To: <1088759434.6929.4.camel@localhost> Message-ID: Hi, in think the problem ist /sys/class/gnbd/gnbd0/name i dont have this path/devive or what ever it is ;) i just have /dev/gnbd and /dev/gnbd_ctl -tom Erik Tews To Sent by: Discussion of clustering software linux-cluster-bou components including GFS nces at redhat.com cc 02.07.2004 11:10 Subject Re: [Linux-cluster] Problems with gnbd Please respond to Discussion of clustering software components including GFS Am Fr, den 02.07.2004 schrieb tom at regio.net um 10:30: > gnbd_import > gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such > file or directory First idea, do you got sysfs mounted? -- Linux-cluster mailing list Linux-cluster at redhat.com http://www.redhat.com/mailman/listinfo/linux-cluster From rmayhew at mweb.com Fri Jul 2 13:18:46 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Fri, 2 Jul 2004 15:18:46 +0200 Subject: [Linux-cluster] GFS on RedHat ES 3.0 Message-ID: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com> Hi All, I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade from 2.4.21-15.02 to be able to install the GFS RPMS). I build and installed the supplied ES 3.0 RPMS, but when it comes to doing the depmod -a I end up with this. #depmod -a Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o Does any one have any pointers? -- Regards Richard Mayhew Unix Specialist MWEB Business Tel: + 27 11 340 7200 Fax: + 27 11 340 7288 Website: www.mwebbusiness.co.za From danderso at redhat.com Fri Jul 2 13:35:43 2004 From: danderso at redhat.com (Derek Anderson) Date: Fri, 2 Jul 2004 08:35:43 -0500 Subject: [Linux-cluster] Compiling GFS .. In-Reply-To: <1088757706.721.198.camel@squizzey> References: <1088757706.721.198.camel@squizzey> Message-ID: <200407020835.43026.danderso@redhat.com> Gareth: (cd cluster && ./configure && make install) See if that works. On Friday 02 July 2004 03:41, Gareth Bult wrote: > Hi, > > I'm trying to compile the current /cluster cvs against 2.6.7 and get the > following error .. anyone any idea what I'm doing wrong ? > (cd /custer && ./configure && make) > > tia > Gareth. > > make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src' > rm -f cluster service.h cnxman.h cnxman-socket.h > ln -s . cluster > ln -s //usr/include/cluster/service.h . > ln -s //usr/include/cluster/cnxman.h . > ln -s //usr/include/cluster/cnxman-socket.h . > make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules > USING_KBUILD=yes > make[3]: Entering directory `/usr/src/linux-2.6.7' > CC [M] /root/cvs/cluster/dlm-kernel/src/ast.o > In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20: > /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service. > h: No such file or directory > make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1 > make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2 > make[3]: Leaving directory `/usr/src/linux-2.6.7' > make[2]: *** [all] Error 2 > make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel' > make: *** [all] Error 2 From danderso at redhat.com Fri Jul 2 13:38:46 2004 From: danderso at redhat.com (Derek Anderson) Date: Fri, 2 Jul 2004 08:38:46 -0500 Subject: [Linux-cluster] GFS on RedHat ES 3.0 In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com> References: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com> Message-ID: <200407020838.46175.danderso@redhat.com> Richard: Which hardware architecture are your machines? Assuming Intel x86, make sure you use the i686.rpms instead of the i386.rpms. On Friday 02 July 2004 08:18, Richard Mayhew wrote: > Hi All, > > I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade > from 2.4.21-15.02 to be able to install the GFS RPMS). > I build and installed the supplied ES 3.0 RPMS, but when it comes to > doing the depmod -a > > I end up with this. > #depmod -a > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o > > Does any one have any pointers? > > -- > > Regards > > Richard Mayhew > Unix Specialist > > MWEB Business > Tel: + 27 11 340 7200 > Fax: + 27 11 340 7288 > Website: www.mwebbusiness.co.za > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From bdcneal at budget.state.ny.us Fri Jul 2 13:34:59 2004 From: bdcneal at budget.state.ny.us (Mark Neal) Date: Fri, 02 Jul 2004 09:34:59 -0400 Subject: [Linux-cluster] GFS on RedHat ES 3.0 Message-ID: one way to avoid this is to: 1) grab the kernel source (kernel-source-2.4.21-15.0.2.EL) 2) apply the patches that come in the gfs rpm (GFS-6.0.0-1.2.TL1.src.rpm) 3) compile with your current config file (make sure to do a make oldconfig to be safe) Mark Neal System Administrator - Web Services NYS Division of Budget (518) 402-4181 >>> rmayhew at mweb.com 07/02/04 09:19 AM >>> Hi All, I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade from 2.4.21-15.02 to be able to install the GFS RPMS). I build and installed the supplied ES 3.0 RPMS, but when it comes to doing the depmod -a I end up with this. #depmod -a Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o Does any one have any pointers? -- Regards Richard Mayhew Unix Specialist MWEB Business Tel: + 27 11 340 7200 Fax: + 27 11 340 7288 Website: www.mwebbusiness.co.za -- Linux-cluster mailing list Linux-cluster at redhat.com http://www.redhat.com/mailman/listinfo/linux-cluster From amanthei at redhat.com Fri Jul 2 13:36:48 2004 From: amanthei at redhat.com (Adam Manthei) Date: Fri, 2 Jul 2004 08:36:48 -0500 Subject: [Linux-cluster] GFS on RedHat ES 3.0 In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com> References: <91C4F1A7C418014D9F88E938C13554584B281B@mwjdc2.mweb.com> Message-ID: <20040702133648.GC23240@redhat.com> On Fri, Jul 02, 2004 at 03:18:46PM +0200, Richard Mayhew wrote: > Hi All, > > I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade > from 2.4.21-15.02 to be able to install the GFS RPMS). > I build and installed the supplied ES 3.0 RPMS, but when it comes to > doing the depmod -a > > I end up with this. > #depmod -a > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o > > Does any one have any pointers? Make sure the kernel versions and architectures match. For example, if your kernel is i686 SMP, then make sure you have i686 SMP gfs modules too. > > -- > > Regards > > Richard Mayhew > Unix Specialist > > MWEB Business > Tel: + 27 11 340 7200 > Fax: + 27 11 340 7288 > Website: www.mwebbusiness.co.za > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Adam Manthei From rmayhew at mweb.com Fri Jul 2 13:44:26 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Fri, 2 Jul 2004 15:44:26 +0200 Subject: [Linux-cluster] GFS on RedHat ES 3.0 Message-ID: <91C4F1A7C418014D9F88E938C13554584B2833@mwjdc2.mweb.com> Hi Thanks for the quick response. I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600 SAN. I grabbed the 3 RPMS from ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ Should I be downloading them for another source? Is there support yet for the latest ES 3.0 kernel? After rebuilding these RPMS' I only end up with the following. GFS-6.0.0-1.2.i386.rpm GFS-6.0.0-1.2.src.rpm GFS-debuginfo-6.0.0-1.2.i386.rpm GFS-devel-6.0.0-1.2.i386.rpm GFS-modules-6.0.0-1.2.i386.rpm perl-Net-Telnet-3.03-2.noarch.rpm perl-Net-Telnet-3.03-2.src.rpm rh-gfs-en-6.0-4.noarch.rpm rh-gfs-en-6.0-4.src.rpm This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm Thanks. -----Original Message----- From: Derek Anderson [mailto:danderso at redhat.com] Sent: 02 July 2004 03:39 PM To: Discussion of clustering software components including GFS; Richard Mayhew Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0 Richard: Which hardware architecture are your machines? Assuming Intel x86, make sure you use the i686.rpms instead of the i386.rpms. On Friday 02 July 2004 08:18, Richard Mayhew wrote: > Hi All, > > I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade > from 2.4.21-15.02 to be able to install the GFS RPMS). > I build and installed the supplied ES 3.0 RPMS, but when it comes to > doing the depmod -a > > I end up with this. > #depmod -a > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o > > Does any one have any pointers? > > -- > > Regards > > Richard Mayhew > Unix Specialist > > MWEB Business > Tel: + 27 11 340 7200 > Fax: + 27 11 340 7288 > Website: www.mwebbusiness.co.za > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From Gareth at Linux.co.uk Fri Jul 2 13:59:57 2004 From: Gareth at Linux.co.uk (Gareth Bult) Date: Fri, 02 Jul 2004 14:59:57 +0100 Subject: [Linux-cluster] Compiling GFS .. In-Reply-To: <200407020835.43026.danderso@redhat.com> References: <1088757706.721.198.camel@squizzey> <200407020835.43026.danderso@redhat.com> Message-ID: <1088776797.724.202.camel@squizzey> :) very funny. I've found by creating /usr/include/cluster and copying in a few headers I managed to make it build .. still experimenting ... Gareth. On Fri, 2004-07-02 at 08:35 -0500, Derek Anderson wrote: > Gareth: > > (cd cluster && ./configure && make install) > > See if that works. > > On Friday 02 July 2004 03:41, Gareth Bult wrote: > > Hi, > > > > I'm trying to compile the current /cluster cvs against 2.6.7 and get the > > following error .. anyone any idea what I'm doing wrong ? > > (cd /custer && ./configure && make) > > > > tia > > Gareth. > > > > make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src' > > rm -f cluster service.h cnxman.h cnxman-socket.h > > ln -s . cluster > > ln -s //usr/include/cluster/service.h . > > ln -s //usr/include/cluster/cnxman.h . > > ln -s //usr/include/cluster/cnxman-socket.h . > > make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules > > USING_KBUILD=yes > > make[3]: Entering directory `/usr/src/linux-2.6.7' > > CC [M] /root/cvs/cluster/dlm-kernel/src/ast.o > > In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20: > > /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service. > > h: No such file or directory > > make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1 > > make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2 > > make[3]: Leaving directory `/usr/src/linux-2.6.7' > > make[2]: *** [all] Error 2 > > make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src' > > make[1]: *** [all] Error 2 > > make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel' > > make: *** [all] Error 2 > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smiley-3.png Type: image/png Size: 819 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From danderso at redhat.com Fri Jul 2 14:42:20 2004 From: danderso at redhat.com (Derek Anderson) Date: Fri, 2 Jul 2004 09:42:20 -0500 Subject: [Linux-cluster] Compiling GFS .. In-Reply-To: <1088776797.724.202.camel@squizzey> References: <1088757706.721.198.camel@squizzey> <200407020835.43026.danderso@redhat.com> <1088776797.724.202.camel@squizzey> Message-ID: <200407020942.20781.danderso@redhat.com> No, seriously. The 'make install' target should work without moving header files around. On Friday 02 July 2004 08:59, Gareth Bult wrote: > :) very funny. > > I've found by creating /usr/include/cluster and copying in a few headers > I managed to make it build .. still experimenting ... > > Gareth. > > On Fri, 2004-07-02 at 08:35 -0500, Derek Anderson wrote: > > Gareth: > > > > (cd cluster && ./configure && make install) > > > > See if that works. > > > > On Friday 02 July 2004 03:41, Gareth Bult wrote: > > > Hi, > > > > > > I'm trying to compile the current /cluster cvs against 2.6.7 and get > > > the following error .. anyone any idea what I'm doing wrong ? > > > (cd /custer && ./configure && make) > > > > > > tia > > > Gareth. > > > > > > make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src' > > > rm -f cluster service.h cnxman.h cnxman-socket.h > > > ln -s . cluster > > > ln -s //usr/include/cluster/service.h . > > > ln -s //usr/include/cluster/cnxman.h . > > > ln -s //usr/include/cluster/cnxman-socket.h . > > > make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules > > > USING_KBUILD=yes > > > make[3]: Entering directory `/usr/src/linux-2.6.7' > > > CC [M] /root/cvs/cluster/dlm-kernel/src/ast.o > > > In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20: > > > /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service. > > > h: No such file or directory > > > make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1 > > > make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2 > > > make[3]: Leaving directory `/usr/src/linux-2.6.7' > > > make[2]: *** [all] Error 2 > > > make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src' > > > make[1]: *** [all] Error 2 > > > make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel' > > > make: *** [all] Error 2 From bmarzins at redhat.com Fri Jul 2 15:28:08 2004 From: bmarzins at redhat.com (Benjamin Marzinski) Date: Fri, 2 Jul 2004 10:28:08 -0500 Subject: [Linux-cluster] GFS with ATA/IDE drives In-Reply-To: <20040701042816.GA361@mfa.kfki.hu> References: <4diduga40c0c63s.300620041051@mail.nextresponse.com> <20040630200604.GA26510@potassium.msp.redhat.com> <20040701042816.GA361@mfa.kfki.hu> Message-ID: <20040702152808.GB27303@phlogiston.msp.redhat.com> On Thu, Jul 01, 2004 at 06:28:16AM +0200, Gergely Tamas wrote: > Hi! > > > The most simple setup is to have one machine with a big pile of IDE disks > > export thost disks to an IP network with GNBD (or iSCSI). > > Does anyone know a reliable iSCSI (server side) software implementation? > > Thanks in advance, > Gergely There are two that I've tried. http://www.ardistech.com/iscsi/ and http://sourceforge.net/projects/unh-iscsi/ both seem to work reasonably well. The ArdisTech target is only for 2.4 but has a more reasonable UI The UNH target works better, as far as I can tell, and works on 2.6, but it was obviously designed to test the UNH iscsi initiator. The UI leaves tons to be desired. -Ben > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From bmarzins at redhat.com Fri Jul 2 15:35:02 2004 From: bmarzins at redhat.com (Benjamin Marzinski) Date: Fri, 2 Jul 2004 10:35:02 -0500 Subject: [Linux-cluster] Problems with gnbd In-Reply-To: References: <1088759434.6929.4.camel@localhost> Message-ID: <20040702153502.GC27303@phlogiston.msp.redhat.com> On Fri, Jul 02, 2004 at 11:35:24AM +0200, tom at regio.net wrote: > > > > > Hi, > > in think the problem ist /sys/class/gnbd/gnbd0/name > > i dont have this path/devive or what ever it is ;) > > i just have /dev/gnbd and /dev/gnbd_ctl > > -tom That is definitely the problem. That is a sysfs file, and you probably don't have sysfs mounted. run the command: # mount -t sysfs sysfs /sys For more sysfs info, see Documentation/filesystems/sysfs.txt in your kernel directory. -Ben > > > > > Erik Tews > ken.de> To > Sent by: Discussion of clustering software > linux-cluster-bou components including GFS > nces at redhat.com > cc > > 02.07.2004 11:10 Subject > Re: [Linux-cluster] Problems with > gnbd > Please respond to > Discussion of > clustering > software > components > including GFS > dhat.com> > > > > > > > Am Fr, den 02.07.2004 schrieb tom at regio.net um 10:30: > > gnbd_import > > gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such > > file or directory > > First idea, do you got sysfs mounted? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From Gareth at Linux.co.uk Fri Jul 2 15:38:48 2004 From: Gareth at Linux.co.uk (Gareth Bult) Date: Fri, 02 Jul 2004 16:38:48 +0100 Subject: [Linux-cluster] Compiling GFS .. In-Reply-To: <200407020942.20781.danderso@redhat.com> References: <1088757706.721.198.camel@squizzey> <200407020835.43026.danderso@redhat.com> <1088776797.724.202.camel@squizzey> <200407020942.20781.danderso@redhat.com> Message-ID: <1088782728.721.213.camel@squizzey> Mmm.. I did try various combinations and I thought I'd tried that .. I'll try another box to confirm as soon as I can get the first one to do something .. Regards, Gareth. On Fri, 2004-07-02 at 09:42 -0500, Derek Anderson wrote: > No, seriously. The 'make install' target should work without moving header > files around. > > On Friday 02 July 2004 08:59, Gareth Bult wrote: > > :) very funny. > > > > I've found by creating /usr/include/cluster and copying in a few headers > > I managed to make it build .. still experimenting ... > > > > Gareth. > > > > On Fri, 2004-07-02 at 08:35 -0500, Derek Anderson wrote: > > > Gareth: > > > > > > (cd cluster && ./configure && make install) > > > > > > See if that works. > > > > > > On Friday 02 July 2004 03:41, Gareth Bult wrote: > > > > Hi, > > > > > > > > I'm trying to compile the current /cluster cvs against 2.6.7 and get > > > > the following error .. anyone any idea what I'm doing wrong ? > > > > (cd /custer && ./configure && make) > > > > > > > > tia > > > > Gareth. > > > > > > > > make[2]: Entering directory `/root/cvs/cluster/dlm-kernel/src' > > > > rm -f cluster service.h cnxman.h cnxman-socket.h > > > > ln -s . cluster > > > > ln -s //usr/include/cluster/service.h . > > > > ln -s //usr/include/cluster/cnxman.h . > > > > ln -s //usr/include/cluster/cnxman-socket.h . > > > > make -C /usr/src/linux-2.6 M=/root/cvs/cluster/dlm-kernel/src modules > > > > USING_KBUILD=yes > > > > make[3]: Entering directory `/usr/src/linux-2.6.7' > > > > CC [M] /root/cvs/cluster/dlm-kernel/src/ast.o > > > > In file included from /root/cvs/cluster/dlm-kernel/src/ast.c:20: > > > > /root/cvs/cluster/dlm-kernel/src/dlm_internal.h:36:29: cluster/service. > > > > h: No such file or directory > > > > make[4]: *** [/root/cvs/cluster/dlm-kernel/src/ast.o] Error 1 > > > > make[3]: *** [_module_/root/cvs/cluster/dlm-kernel/src] Error 2 > > > > make[3]: Leaving directory `/usr/src/linux-2.6.7' > > > > make[2]: *** [all] Error 2 > > > > make[2]: Leaving directory `/root/cvs/cluster/dlm-kernel/src' > > > > make[1]: *** [all] Error 2 > > > > make[1]: Leaving directory `/root/cvs/cluster/dlm-kernel' > > > > make: *** [all] Error 2 > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Fri Jul 2 18:53:35 2004 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 02 Jul 2004 14:53:35 -0400 Subject: [Linux-cluster] one-node cluster.xml (from question on IRC) Message-ID: <1088794415.25468.8.camel@atlantis.boston.redhat.com> Place in /etc and /etc/cluster; salt to taste. You *need* fencing, even if it's just fence-manual. -- Lon From Gareth at Linux.co.uk Fri Jul 2 19:07:05 2004 From: Gareth at Linux.co.uk (Gareth Bult) Date: Fri, 02 Jul 2004 20:07:05 +0100 Subject: [Linux-cluster] one-node cluster.xml (from question on IRC) In-Reply-To: <1088794415.25468.8.camel@atlantis.boston.redhat.com> References: <1088794415.25468.8.camel@atlantis.boston.redhat.com> Message-ID: <1088795225.721.218.camel@squizzey> Tvm. Is there any documentation on this anywhere ... ? tia Gareth. On Fri, 2004-07-02 at 14:53 -0400, Lon Hohberger wrote: > Place in /etc and /etc/cluster; salt to taste. You *need* fencing, even > if it's just fence-manual. > > -- Lon > > > > > > > > > > > > > > > > > > > > > > > login="apc" password="apc"/> > password="wti"/> > > > > > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From lhh at redhat.com Fri Jul 2 20:12:15 2004 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 02 Jul 2004 16:12:15 -0400 Subject: [Linux-cluster] one-node cluster.xml (from question on IRC) In-Reply-To: <1088795225.721.218.camel@squizzey> References: <1088794415.25468.8.camel@atlantis.boston.redhat.com> <1088795225.721.218.camel@squizzey> Message-ID: <1088799135.25468.13.camel@atlantis.boston.redhat.com> On Fri, 2004-07-02 at 20:07 +0100, Gareth Bult wrote: > Tvm. > > Is there any documentation on this anywhere ... ? Not a lot. The format is still subject to change to some degree. Eventually, there will be a GUI app for configuring it, so you won't have to memorize the XML tags ;) -- Lon From lists at wikidev.net Sat Jul 3 00:15:56 2004 From: lists at wikidev.net (Gabriel Wicke) Date: Sat, 03 Jul 2004 02:15:56 +0200 Subject: [Linux-cluster] ccsd hanging after start on debian unstable Message-ID: <1088813756.2249.15.camel@venus> Hello, i have some problems with ccsd on debian unstable- it hangs after starting and eats 100% cpu. A normal kill is enough to stop it again. I've done a strace of ccsd -n, full log at http://dl.aulinx.de/gfs/ccsd.strace. I'm willing to investigate the cause for this, any pointers appreciated- here or in #linux-cluster (nick gwicke). Thanks -- Gabriel Wicke From jeff at intersystems.com Sat Jul 3 14:33:56 2004 From: jeff at intersystems.com (Jeff) Date: Sat, 3 Jul 2004 10:33:56 -0400 Subject: [Linux-cluster] Some GDLM questions Message-ID: <104121513.20040703103356@intersystems.com> These are from reviewing http://people.redhat.com/~teigland/sca.pdf and the CVS copy of cluster/dlm/doc/libdlm.txt. ------------------------------------------------------------------ If a program requests a lock on the AST side can it wait for the lock to complete without returning from the original AST routine? Would it use the poll/select mechanism to do this? What's the best way to implement a blocking lock request in an application where some requests are synchronous and some are asynchronous? Use semop() after the lock request and in the lock completion routine? Is semop() safe to call from a thread on Linux? Would pthread_cond_wait()/pthread_cond_signal() be better? Does conversion deadlock occur only when a conversion is about to be queued and its granted/requested state is incompatible with another lock already on the conversion queue? (eg. there is a PR->EX conversion queued and another PR->EX conversion is about to be queued) Other DLMs do not deliver a blocking AST to a lock which is not on the granted queue. This means that a lock which queued for conversion will not get a blocking AST if it is interfering with another lock being added to the conversion queue. Does GDLM do this as well or are blocking ASTs delivered to all locks regardless of their state? GDLM is not listed as a client of FENCE. This seems to imply that a GDLM application has to interact directly with FENCE to deal with the unknown state problem in a 2 node cluster where each member has 1 vote and expected votes is 1 (section 3.2.6.2, page 28) as otherwise the same lockspace could end up existing on multiple machines in a single cluster. How would an application interact with FENCE to prevent this or does this have to be handled by configuring the cluster to reboot in this case? libdlm.txt has a vague comment which reads: One further point about lockspace operations is that there is no locking on the creating/destruction of lockspaces in the library so it is up to the application to only call dlm_*_lockspace when it is sure that no other locking operations are likely to be happening. Does this mean 'no other locking operations' by the process which is creating the lockspace? no other requests to create a lock space on that cluster member? in the cluster as a whole? Possible Enhancements: ---------------------- The following two items are areas where GDLM appears to differ from the DLMs from HP and IBM (eg for VMS, Tru64, AIX and OpenDLM for Linux which is derived from IBM's DLM for AIX). These differences aren't incompatible with GFS's requirements and could be implemented as optional behaviors. I'd be happy to work on patches for these if they would be welcome. GDLM is described as granting new lock requests as long as they are compatible with the existing lock mode regardless of the existence of a conversion queue. The other DLMs mentioned above always queue new lock requests if there are any locks on the conversion queue. Certain mechanisms can't be implemented without this kind of ordering. Would it be possible to make the alternate behavior a property of the lock space or a property of a grant request so it can be utilized where necessary? Certain tasks are simplified if the return status of a lock indicates whether it was granted immediately or ended up on the waiting queue. Other DLMs which have both synchronous and asynchronous completion mechanisms implement this via a flag which requests synchronous completion if the lock is available, otherwise the request is queued and the asynchronous mechanism is used. This is particularly useful for deadman locks that control recovery to distinguish between the first instance of a service to start and recovery conditions. There are other (more complex) techniques to implement this but even though GDLM is purely an asynchronous mechanism, it still would be possible for the completion status to indicate (if requested) whether the lock was granted immediately or not. From Gareth at Linux.co.uk Sat Jul 3 16:20:39 2004 From: Gareth at Linux.co.uk (Gareth Bult) Date: Sat, 03 Jul 2004 17:20:39 +0100 Subject: [Linux-cluster] ccsd hanging after start on debian unstable In-Reply-To: <1088813756.2249.15.camel@venus> References: <1088813756.2249.15.camel@venus> Message-ID: <1088871581.7439.1.camel@rag.linux.co.uk> Hi, fyi; I get this both on AMD64 and x86 test boxes .. I've tried two methods; a. CVS b. EBuilds from Datacore (!) Same results, strace says the "network is down" in a loop using 100% CPU Gareth. On Sat, 2004-07-03 at 01:15, Gabriel Wicke wrote: > Hello, > > i have some problems with ccsd on debian unstable- it hangs after > starting and eats 100% cpu. A normal kill is enough to stop it again. > > I've done a strace of ccsd -n, full log at > http://dl.aulinx.de/gfs/ccsd.strace. > > I'm willing to investigate the cause for this, any pointers appreciated- > here or in #linux-cluster (nick gwicke). > > Thanks From teigland at redhat.com Sat Jul 3 16:00:47 2004 From: teigland at redhat.com (David Teigland) Date: Sun, 4 Jul 2004 00:00:47 +0800 Subject: [Linux-cluster] Some GDLM questions In-Reply-To: <104121513.20040703103356@intersystems.com> References: <104121513.20040703103356@intersystems.com> Message-ID: <20040703160047.GD8257@redhat.com> > GDLM is not listed as a client of FENCE. This seems to imply > that a GDLM application has to interact directly with FENCE to > deal with the unknown state problem in a 2 node cluster where each > member has 1 vote and expected votes is 1 (section 3.2.6.2, page 28) > as otherwise the same lockspace could end up existing on multiple > machines in a single cluster. How would an application interact > with FENCE to prevent this or does this have to be handled by > configuring the cluster to reboot in this case? This is the quickest one to answer right off the bat. We'll get to the others over the next few days I expect. Fencing is a service that runs on its own in a CMAN cluster; it's entirely independent from other services. GFS simply checks to verify fencing is running before allowing a mount since it's especially dangerous for a mount to succeed without it. As soon as a node joins a fencing domain it will be fenced by another domain member if it fails. i.e. as soon as a node runs: > cman_tool join (joins the cluster) > fence_tool join (starts fenced which joins the default fence domain) it will be fenced by other domain members if it fails. So, you simply need to configure your nodes to run fence_tool join after joining the cluster if you want fencing to happen. You can add any checks later on that you think are necessary to be sure that the node is in the fence domain. (Looking at /proc/cluster/services is one way.) Running fence_tool leave will remove a node cleanly from the fence domain (it won't be fenced by other members.) One note of warning. If the fence daemon (fenced) process is killed on node X, it appears to fenced processes on other nodes that X has left the domain cleanly (just as if it had run fence_tool leave). X only leaves the domain "uncleanly" when the node itself fails (meaning the cluster manager decides X has failed.) There is some further development planned to address this. -- Dave Teigland From Gareth at Bult.co.uk Sat Jul 3 10:05:17 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Sat, 03 Jul 2004 11:05:17 +0100 Subject: [Linux-cluster] one-node cluster.xml (from question on IRC) In-Reply-To: <1088799135.25468.13.camel@atlantis.boston.redhat.com> References: <1088794415.25468.8.camel@atlantis.boston.redhat.com> <1088795225.721.218.camel@squizzey> <1088799135.25468.13.camel@atlantis.boston.redhat.com> Message-ID: <1088849117.724.222.camel@squizzey> Ok, Well here's a rather useful page someone passed me .. :) https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.Install It even has sample cluster.xml's ;-) Regards, Gareth. On Fri, 2004-07-02 at 16:12 -0400, Lon Hohberger wrote: > On Fri, 2004-07-02 at 20:07 +0100, Gareth Bult wrote: > > Tvm. > > > > Is there any documentation on this anywhere ... ? > > Not a lot. The format is still subject to change to some degree. > Eventually, there will be a GUI app for configuring it, so you won't > have to memorize the XML tags ;) > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smiley-4.png Type: image/png Size: 822 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smiley-3.png Type: image/png Size: 819 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From Gareth at Bult.co.uk Sat Jul 3 10:07:25 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Sat, 03 Jul 2004 11:07:25 +0100 Subject: [Linux-cluster] ccsd hanging after start on debian unstable In-Reply-To: <1088813756.2249.15.camel@venus> References: <1088813756.2249.15.camel@venus> Message-ID: <1088849245.724.224.camel@squizzey> Hi, ccsd starts a number of threads .. pick the one eating the CPU and "strace -p" it .. Gareth. On Sat, 2004-07-03 at 02:15 +0200, Gabriel Wicke wrote: > Hello, > > i have some problems with ccsd on debian unstable- it hangs after > starting and eats 100% cpu. A normal kill is enough to stop it again. > > I've done a strace of ccsd -n, full log at > http://dl.aulinx.de/gfs/ccsd.strace. > > I'm willing to investigate the cause for this, any pointers appreciated- > here or in #linux-cluster (nick gwicke). > > Thanks -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From Gareth at Bult.co.uk Sat Jul 3 19:56:10 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Sat, 03 Jul 2004 20:56:10 +0100 Subject: [Linux-cluster] ccsd hanging after start on debian unstable In-Reply-To: <1088871581.7439.1.camel@rag.linux.co.uk> References: <1088813756.2249.15.camel@venus> <1088871581.7439.1.camel@rag.linux.co.uk> Message-ID: <1088884570.30660.1.camel@squizzey> Ok, this was as a result of an invalid tag in cluster.xml .. is there a way to validate cluster.xml is it ccsd does not appear to print any warnings/errors if the file is invalid .. ? Gareth. On Sat, 2004-07-03 at 17:20 +0100, Gareth Bult wrote: > Hi, > > fyi; I get this both on AMD64 and x86 test boxes .. > > I've tried two methods; > a. CVS > b. EBuilds from Datacore > > (!) > > Same results, strace says the "network is down" in a loop using 100% CPU > > Gareth. > > On Sat, 2004-07-03 at 01:15, Gabriel Wicke wrote: > > Hello, > > > > i have some problems with ccsd on debian unstable- it hangs after > > starting and eats 100% cpu. A normal kill is enough to stop it again. > > > > I've done a strace of ccsd -n, full log at > > http://dl.aulinx.de/gfs/ccsd.strace. > > > > I'm willing to investigate the cause for this, any pointers appreciated- > > here or in #linux-cluster (nick gwicke). > > > > Thanks > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From Gareth at Bult.co.uk Sun Jul 4 00:09:42 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Sun, 04 Jul 2004 01:09:42 +0100 Subject: [Linux-cluster] Possible problem with different architectures Message-ID: <1088899782.11202.9.camel@squizzey> Hi, With help from the guys on #linux-cluster ( thanks guys :) ) I've managed to get a 3-node cluster running. Two of the nodes are x86 and the third is an amd64 - all are running identical Gentoo installs on kernel 2.6.7. All are running an up-to-date cvs /cluster. I can successfully export a device from one x86 box to another, then format/mount a gfs on it on both x86 boxes - this works great. However, I can't run gnbd_import on the amd64 box. I get; gnbd_import: /dev/gnbd/netdisc is not in use. deleting gnbd_import: created gnbd device netdisc2 gnbd_monitor: gnbd_monitor started. Monitoring device #0 gnbd_import: ERROR gnbd_recvd failed It "looks" like gnbd_recvd is failing to complete a handshake, i.e. hanging half way through .. .. Any suggestions welcome. On another note, I've had a number of kernel crashes and I'm wondering looking at the logs whether it's because I'm running a preemtable kernel ... ? Here are two sample crash dumps from syslog.. typically the machine goes D-state on the processes involved and won't shutdown cleanly ... Crash #1 (x86 box): Jul 3 22:44:48 rag CMAN: node squizzey.linux.co.uk is not responding - removing from the cluster Jul 3 22:44:53 rag dlm: clvmd: recover event 2 (first) Jul 3 22:44:53 rag dlm: clvmd: add nodes Jul 3 22:44:53 rag Unable to handle kernel paging request at virtual address 0c000000 Jul 3 22:44:53 rag printing eip: Jul 3 22:44:53 rag c013c2cb Jul 3 22:44:53 rag *pde = 00000000 Jul 3 22:44:53 rag Oops: 0000 [#1] Jul 3 22:44:53 rag PREEMPT Jul 3 22:44:53 rag Modules linked in: gnbd gfs lock_dlm dlm cman lock_harness ohci_hcd e100 mii snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd uhci_hcd intel_agp agpgart st usb_storage scsi_mod ehci_hcd usbcore Jul 3 22:44:53 rag CPU: 0 Jul 3 22:44:53 rag EIP: 0060:[] Not tainted Jul 3 22:44:53 rag EFLAGS: 00010292 (2.6.7) Jul 3 22:44:53 rag EIP is at page_address+0xb/0xb0 Jul 3 22:44:53 rag eax: 0c000000 ebx: 0c000000 ecx: 00000000 edx: 18e0e600 Jul 3 22:44:53 rag esi: 18e0e600 edi: e0e600b8 ebp: e0e600e8 esp: e0e15e1c Jul 3 22:44:53 rag ds: 007b es: 007b ss: 0068 Jul 3 22:44:53 rag Process dlm_recoverd (pid: 9579, threadinfo=e0e14000 task=e6542eb0) Jul 3 22:44:53 rag Stack: 00000000 e0e60001 18e0e600 e0e600b8 e0e600e8 e85baee1 0c000000 e85c84b7 Jul 3 22:44:53 rag 18e0e600 18000000 00000018 e0e15ee0 00000002 00000002 e85bb3cf 00000002 Jul 3 22:44:53 rag 00000018 000000d0 e0e15e6c 00000000 00000000 00000018 e0e15ee0 00000002 Jul 3 22:44:53 rag Call Trace: Jul 3 22:44:53 rag [] lowcomms_get_buffer+0x81/0x150 [dlm] Jul 3 22:44:53 rag [] lowcomms_send_message+0x3f/0xf0 [dlm] Jul 3 22:44:53 rag [] midcomms_send_message+0x44/0x70 [dlm] Jul 3 22:44:53 rag [] rcom_send_message+0xd1/0x210 [dlm] Jul 3 22:44:53 rag [] gdlm_wait_status_low+0x60/0x90 [dlm] Jul 3 22:44:53 rag [] nodes_reconfig_wait+0x2a/0x80 [dlm] Jul 3 22:44:53 rag [] ls_nodes_init+0xbf/0x150 [dlm] Jul 3 22:44:53 rag [] ls_first_start+0x62/0x160 [dlm] Jul 3 22:44:53 rag [] do_ls_recovery+0x1ed/0x430 [dlm] Jul 3 22:44:53 rag [] dlm_recoverd+0x143/0x180 [dlm] Jul 3 22:44:53 rag [] default_wake_function+0x0/0x20 Jul 3 22:44:53 rag [] ret_from_fork+0x6/0x14 Jul 3 22:44:53 rag [] default_wake_function+0x0/0x20 Jul 3 22:44:53 rag [] dlm_recoverd+0x0/0x180 [dlm] Jul 3 22:44:53 rag [] kernel_thread_helper+0x5/0x18 Jul 3 22:44:53 rag Jul 3 22:44:53 rag Code: 8b 03 f6 c4 01 75 1e 8b 2d 8c 63 48 c0 29 eb c1 fb 05 c1 e3 Jul 3 22:44:53 rag ccsd[9560]: Error while processing get: No data available Crash #2: (amd64) Jul 3 21:42:28 squizzey dlm: clvmd: recover event 2 (first) Jul 3 21:42:28 squizzey dlm: clvmd: add nodes Jul 3 21:42:28 squizzey Unable to handle kernel NULL pointer dereference at 000000000000008a RIP: Jul 3 21:42:28 squizzey {:dlm:send_to_sock+54} Jul 3 21:42:28 squizzey PML4 3f7a9067 PGD b591067 PMD 0 Jul 3 21:42:28 squizzey Oops: 0000 [1] PREEMPT Jul 3 21:42:28 squizzey CPU 0 Jul 3 21:42:28 squizzey Modules linked in: gnbd lock_dlm dlm cman gfs lock_harness dm_mod ipt_ttl ipt_limit ipt_state iptable_filter iptable_mangle ipt_LOG ipt_MASQUERADE ipt_TOS ipt_REDIRECT iptable_nat ipt_REJECT ip_tables ip_conntrack_irc ip_conntrack_ftp ip_conntrack nvidia usblp usbhid forcedeth ohci_hcd snd_intel8x0 snd_ac97_codec snd_mpu401_uart snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_page_alloc snd_timer snd_mixer_oss snd usb_storage ehci_hcd usbcore Jul 3 21:42:28 squizzey Pid: 31748, comm: dlm_sendd Tainted: P 2.6.7 Jul 3 21:42:28 squizzey RIP: 0010:[] {:dlm:send_to_sock+54} Jul 3 21:42:28 squizzey RSP: 0018:00000100319b5ec8 EFLAGS: 00010202 Jul 3 21:42:28 squizzey RAX: 0000000000000002 RBX: ffffffffa06ca0f0 RCX: 00000100139c80c0 Jul 3 21:42:28 squizzey RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000100139c80b8 Jul 3 21:42:28 squizzey RBP: 00000100139c80a8 R08: 00000100319b4000 R09: 0000000000000000 Jul 3 21:42:28 squizzey R10: 00000000ffffffff R11: 0000000000000000 R12: 0000010030d1d150 Jul 3 21:42:28 squizzey R13: 00000100139c80a8 R14: 0000000000000000 R15: 000000358cc16f78 Jul 3 21:42:28 squizzey FS: 000000358d80f640(0000) GS:ffffffff804f61c0 (0000) knlGS:0000000000000000 Jul 3 21:42:28 squizzey CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 3 21:42:28 squizzey CR2: 000000000000008a CR3: 0000000000101000 CR4: 00000000000006e0 Jul 3 21:42:28 squizzey Process dlm_sendd (pid: 31748, threadinfo 00000100319b4000, task 000001000676a000) Jul 3 21:42:28 squizzey Stack: 0000007a319b5f08 00000100139c80b8 0000000000000a64 ffffffffa06ca0f0 Jul 3 21:42:28 squizzey 00000100139c80a8 0000010030d1d150 0000000000000005 00000100297df89c Jul 3 21:42:28 squizzey 000000358cc16f78 ffffffffa06b637d Jul 3 21:42:28 squizzey Call Trace: {:dlm:process_output_queue+157} {:dlm:dlm_sendd+184} Jul 3 21:42:28 squizzey {child_rip+8} {:dlm:dlm_sendd+0} Jul 3 21:42:28 squizzey {child_rip+0} Jul 3 21:42:28 squizzey Jul 3 21:42:28 squizzey Code: 48 8b 80 88 00 00 00 48 89 44 24 10 65 48 8b 04 25 18 00 00 -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smiley-3.png Type: image/png Size: 819 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jeff at intersystems.com Sun Jul 4 14:41:58 2004 From: jeff at intersystems.com (Jeff) Date: Sun, 4 Jul 2004 10:41:58 -0400 Subject: [Linux-cluster] Some GDLM questions In-Reply-To: <20040703160047.GD8257@redhat.com> References: <104121513.20040703103356@intersystems.com> <20040703160047.GD8257@redhat.com> Message-ID: <7810099096.20040704104158@intersystems.com> Saturday, July 3, 2004, 12:00:47 PM, David Teigland wrote: >> GDLM is not listed as a client of FENCE. This seems to imply >> that a GDLM application has to interact directly with FENCE to >> deal with the unknown state problem in a 2 node cluster where each >> member has 1 vote and expected votes is 1 (section 3.2.6.2, page 28) >> as otherwise the same lockspace could end up existing on multiple >> machines in a single cluster. How would an application interact >> with FENCE to prevent this or does this have to be handled by >> configuring the cluster to reboot in this case? > This is the quickest one to answer right off the bat. We'll get to the others > over the next few days I expect. > Fencing is a service that runs on its own in a CMAN cluster; it's entirely > independent from other services. GFS simply checks to verify fencing is > running before allowing a mount since it's especially dangerous for a mount to > succeed without it. > As soon as a node joins a fencing domain it will be fenced by another domain > member if it fails. i.e. as soon as a node runs: >> cman_tool join (joins the cluster) >> fence_tool join (starts fenced which joins the default fence domain) > it will be fenced by other domain members if it fails. So, you simply need to > configure your nodes to run fence_tool join after joining the cluster if you > want fencing to happen. You can add any checks later on that you think are > necessary to be sure that the node is in the fence domain. (Looking at > /proc/cluster/services is one way.) > Running fence_tool leave will remove a node cleanly from the fence domain (it > won't be fenced by other members.) > One note of warning. If the fence daemon (fenced) process is killed on node X, > it appears to fenced processes on other nodes that X has left the domain > cleanly (just as if it had run fence_tool leave). X only leaves the domain > "uncleanly" when the node itself fails (meaning the cluster manager decides X > has failed.) There is some further development planned to address this. I understand the above but its still not clear to me how a locking application would get fenced. On startup the application could check that the cluster member has joined the fence domain. This will ensure that it gets fenced if something goes wrong. What's not clear is how the fence process will shut down (or suspend) the locking application while fencing the node. Fencing seems to be related to blocking access to I/O devices. From lists at wikidev.net Sun Jul 4 19:27:04 2004 From: lists at wikidev.net (Gabriel Wicke) Date: Sun, 04 Jul 2004 21:27:04 +0200 Subject: [Linux-cluster] ccsd hanging after start on debian unstable In-Reply-To: <1088849245.724.224.camel@squizzey> References: <1088813756.2249.15.camel@venus> <1088849245.724.224.camel@squizzey> Message-ID: <1088969224.1246.10.camel@venus> On Sat, 2004-07-03 at 11:07 +0100, Gareth Bult wrote: > Hi, > > ccsd starts a number of threads .. > > pick the one eating the CPU and "strace -p" it .. Thanks for this tip, i found some information that might be useful. There's only one thread created that doesn't show up in ps aux or top, but it's possible to connect to it by using strace -p pid-of-parent+1. Output is heaps of lines like this, constantly looping/scrolling: socket(PF_BLUETOOTH, SOCK_DGRAM, 3) = -1 ENETDOWN (Network is down) So i suspected some weird Bluetooth/GFS interaction. Recompiled the kernel with Bluetooth support disabled, but same thing. ccs_test seems to work however, the results returned are correct. I've since double- checked cluster.xml a few times, that's very likely not the reason (posted it at http://dl.aulinx.de/gfs/cluster.xml). -- Gabriel Wicke From teigland at redhat.com Mon Jul 5 02:39:47 2004 From: teigland at redhat.com (David Teigland) Date: Mon, 5 Jul 2004 10:39:47 +0800 Subject: [Linux-cluster] Some GDLM questions In-Reply-To: <7810099096.20040704104158@intersystems.com> References: <104121513.20040703103356@intersystems.com> <20040703160047.GD8257@redhat.com> <7810099096.20040704104158@intersystems.com> Message-ID: <20040705023947.GA6629@redhat.com> > I understand the above but its still not clear to me how a > locking application would get fenced. On startup the application > could check that the cluster member has joined the fence domain. > This will ensure that it gets fenced if something goes wrong. > > What's not clear is how the fence process will shut down (or > suspend) the locking application while fencing the node. Fencing > seems to be related to blocking access to I/O devices. I'm not entirely sure what you're asking, but I hope a long and broad answer might answer it. say there's a two node cluster of nodes A and B both nodes are running cman, fence, dlm and some application using the dlm 1. node A: hangs and is unresponsive 2. node B: cman detects that A has failed 3. node B: all cluster services are stopped/suspended (these services are fence and dlm in this example) 4. node B: while dlm service is stopped, it blocks all lock requests 5. node B: cluster still has quorum because of special "two_node" config 6. node B: fence service is started/enabled 7. node B: fence service fences node A 8. node B: dlm service is started/enabled 9. node B: dlm service recovers the application's lock space and lock requests proceed as usual If the fencing method in step 7 only blocks access to i/o devices from node A, node A could potentially "revive" and continue running. The dlm on node B no longer accepts A as a member of the lockspace so any dlm messages from A will be ignored by B. Depending on the application this may not be sufficient to prevent a revived node A from causing problems. If so, the simplest thing is to use a fencing method that resets the power on node A rather than simply blocking its device i/o. -- Dave Teigland From teigland at redhat.com Mon Jul 5 03:22:08 2004 From: teigland at redhat.com (David Teigland) Date: Mon, 5 Jul 2004 11:22:08 +0800 Subject: [Linux-cluster] Some GDLM questions In-Reply-To: <104121513.20040703103356@intersystems.com> References: <104121513.20040703103356@intersystems.com> Message-ID: <20040705032208.GB6629@redhat.com> > Does conversion deadlock occur only when a conversion is > about to be queued and its granted/requested state is > incompatible with another lock already on the conversion queue? > (eg. there is a PR->EX conversion queued and another PR->EX > conversion is about to be queued) Yes. The application can't know ahead of time, of course, whether this will happen since both PR holders may convert to EX at the same time. > Other DLMs do not deliver a blocking AST to a lock which is not > on the granted queue. This means that a lock which queued for > conversion will not get a blocking AST if it is interfering with > another lock being added to the conversion queue. Does GDLM do this > as well or are blocking ASTs delivered to all locks regardless of > their state? We only send blocking asts for locks on the granted queue. This may be a change from what was written in the sca document (which has become incorrect in some places over the past 6 months.) > Possible Enhancements: > ---------------------- > The following two items are areas where GDLM appears to differ from > the DLMs from HP and IBM (eg for VMS, Tru64, AIX and OpenDLM for > Linux which is derived from IBM's DLM for AIX). These differences > aren't incompatible with GFS's requirements and could be implemented > as optional behaviors. I'd be happy to work on patches for > these if they would be welcome. Yes, we'd be very happy to get patches. > GDLM is described as granting new lock requests as long as they > are compatible with the existing lock mode regardless of the > existence of a conversion queue. The other DLMs mentioned above > always queue new lock requests if there are any locks on the conversion > queue. Certain mechanisms can't be implemented without this kind of > ordering. Would it be possible to make the alternate behavior a property > of the lock space or a property of a grant request so it can be > utilized where necessary? If that's the more standard behavior we should look at making it default for us, too. Otherwise a new flag sounds appropriate. > Certain tasks are simplified if the return status of a lock indicates > whether it was granted immediately or ended up on the waiting queue. > Other DLMs which have both synchronous and asynchronous completion > mechanisms implement this via a flag which requests synchronous > completion if the lock is available, otherwise the request is queued > and the asynchronous mechanism is used. This is particularly useful > for deadman locks that control recovery to distinguish between > the first instance of a service to start and recovery conditions. > There are other (more complex) techniques to implement this but > even though GDLM is purely an asynchronous mechanism, it still would > be possible for the completion status to indicate (if requested) > whether the lock was granted immediately or not. A "flags" field in the LKSB can also be used to return information like this. -- Dave Teigland From pcaulfie at redhat.com Mon Jul 5 09:31:35 2004 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 5 Jul 2004 10:31:35 +0100 Subject: [Linux-cluster] Some GDLM questions In-Reply-To: <104121513.20040703103356@intersystems.com> References: <104121513.20040703103356@intersystems.com> Message-ID: <20040705093135.GB30146@tykepenguin.com> On Sat, Jul 03, 2004 at 10:33:56AM -0400, Jeff wrote: > These are from reviewing http://people.redhat.com/~teigland/sca.pdf > and the CVS copy of cluster/dlm/doc/libdlm.txt. > ------------------------------------------------------------------ > > If a program requests a lock on the AST side can it wait for > the lock to complete without returning from the original AST > routine? Would it use the poll/select mechanism to do this? In kernel space you shouldn't wait or do much work in the AST routine or you can block the kernel's AST delivery thread. You can call dlm_lock() in an AST routine though. In userspace you can do pretty much what you like in the AST routine as (by default) they run in a seperate thread - see libdlm for more details on this. > What's the best way to implement a blocking lock request in > an application where some requests are synchronous and some > are asynchronous? Use semop() after the lock request and in > the lock completion routine? Is semop() safe to call from > a thread on Linux? Would pthread_cond_wait()/pthread_cond_signal() > be better? pthreads are recommended for userspace locking. As I mentioned above libdlm uses pthreads (though you can switch this off if you want a non-threaded application and are prepared to do the work yourself). > Does conversion deadlock occur only when a conversion is > about to be queued and its granted/requested state is > incompatible with another lock already on the conversion queue? > (eg. there is a PR->EX conversion queued and another PR->EX > conversion is about to be queued) Yes > Other DLMs do not deliver a blocking AST to a lock which is not > on the granted queue. This means that a lock which queued for > conversion will not get a blocking AST if it is interfering with > another lock being added to the conversion queue. Does GDLM do this > as well or are blocking ASTs delivered to all locks regardless of > their state? Blocking ASTs are sent to locks on the granted queue. > > libdlm.txt has a vague comment which reads: > One further point about lockspace operations is that there is no locking > on the creating/destruction of lockspaces in the library so it is up to > the application to only call dlm_*_lockspace when it is sure that > no other locking operations are likely to be happening. > Does this mean 'no other locking operations' by the process which is > creating the lockspace? no other requests to create a lock space on > that cluster member? in the cluster as a whole? No other locking operation in that process tree. > > Possible Enhancements: > ---------------------- > The following two items are areas where GDLM appears to differ from > the DLMs from HP and IBM (eg for VMS, Tru64, AIX and OpenDLM for > Linux which is derived from IBM's DLM for AIX). These differences > aren't incompatible with GFS's requirements and could be implemented > as optional behaviors. I'd be happy to work on patches for > these if they would be welcome. > > GDLM is described as granting new lock requests as long as they > are compatible with the existing lock mode regardless of the > existence of a conversion queue. The other DLMs mentioned above > always queue new lock requests if there are any locks on the conversion > queue. Certain mechanisms can't be implemented without this kind of > ordering. Would it be possible to make the alternate behavior a property > of the lock space or a property of a grant request so it can be > utilized where necessary? I'm ashamed to admit I didn't know this - we can add it as a lockspace option I think. > Certain tasks are simplified if the return status of a lock indicates > whether it was granted immediately or ended up on the waiting queue. > Other DLMs which have both synchronous and asynchronous completion > mechanisms implement this via a flag which requests synchronous > completion if the lock is available, otherwise the request is queued > and the asynchronous mechanism is used. This is particularly useful > for deadman locks that control recovery to distinguish between > the first instance of a service to start and recovery conditions. > There are other (more complex) techniques to implement this but > even though GDLM is purely an asynchronous mechanism, it still would > be possible for the completion status to indicate (if requested) > whether the lock was granted immediately or not. Off-hand I'm not sure how complex this would be to implement, I'll have a think about it. -- patrick From rmayhew at mweb.com Mon Jul 5 09:50:44 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Mon, 5 Jul 2004 11:50:44 +0200 Subject: [Linux-cluster] GFS on RedHat ES 3.0 Message-ID: <91C4F1A7C418014D9F88E938C13554584B28EA@mwjdc2.mweb.com> Hi Thanks for the quick response. I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600 SAN. I grabbed the 3 RPMS from ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ Should I be downloading them for another source? Is there support yet for the latest ES 3.0 kernel? After rebuilding these RPMS' I only end up with the following. GFS-6.0.0-1.2.i386.rpm GFS-6.0.0-1.2.src.rpm GFS-debuginfo-6.0.0-1.2.i386.rpm GFS-devel-6.0.0-1.2.i386.rpm GFS-modules-6.0.0-1.2.i386.rpm perl-Net-Telnet-3.03-2.noarch.rpm perl-Net-Telnet-3.03-2.src.rpm rh-gfs-en-6.0-4.noarch.rpm rh-gfs-en-6.0-4.src.rpm This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm Thanks. -----Original Message----- From: Adam Manthei [mailto:amanthei at redhat.com] Sent: 02 July 2004 03:37 PM To: Discussion of clustering software components including GFS Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0 On Fri, Jul 02, 2004 at 03:18:46PM +0200, Richard Mayhew wrote: > Hi All, > > I am running RedHat ES3.0 with the kernel 2.4.21-15 (I had to downgrade > from 2.4.21-15.02 to be able to install the GFS RPMS). > I build and installed the supplied ES 3.0 RPMS, but when it comes to > doing the depmod -a > > I end up with this. > #depmod -a > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/block/gnbd/gnbd_serv.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/drivers/md/pool/pool.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/fs/gfs/gfs.o > Jul 2 15:05:40 store-01 depmod: depmod: *** Unresolved symbols in > /lib/modules/2.4.21-15.EL/kernel/fs/gfs_locking/lock_gulm/lock_gulm.o > > Does any one have any pointers? Make sure the kernel versions and architectures match. For example, if your kernel is i686 SMP, then make sure you have i686 SMP gfs modules too. > > -- > > Regards > > Richard Mayhew > Unix Specialist > > MWEB Business > Tel: + 27 11 340 7200 > Fax: + 27 11 340 7288 > Website: www.mwebbusiness.co.za > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Adam Manthei -- Linux-cluster mailing list Linux-cluster at redhat.com http://www.redhat.com/mailman/listinfo/linux-cluster From amir at datacore.ch Sun Jul 4 23:37:16 2004 From: amir at datacore.ch (Amir Guindehi) Date: Mon, 05 Jul 2004 01:37:16 +0200 Subject: [Linux-cluster] GFS Dokumentation: GFS Installation / GNBD Usage / GFS Benchmarks Message-ID: <40E894AC.50703@datacore.ch> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I've consolidated the available GFS dokumentation as well as wrote and added some documentation of my own to: https://open.datacore.ch/page/GFS I hope, this can be of use to others. Regards, - - Amir - -- Amir Guindehi, nospam.amir at datacore.ch DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-nr1 (Windows 2000) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFA6JSpbycOjskSVCwRAn6YAJ4s2MlB/Kcs6YtkMCwfSwUIgAMUdgCeKS6t 8J/zjBGbTd5W7pTPIfZoHgA= =YbYK -----END PGP SIGNATURE----- From pcaulfie at redhat.com Mon Jul 5 13:38:46 2004 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 5 Jul 2004 14:38:46 +0100 Subject: [Linux-cluster] ccsd hanging after start on debian unstable In-Reply-To: <1088969224.1246.10.camel@venus> References: <1088813756.2249.15.camel@venus> <1088849245.724.224.camel@squizzey> <1088969224.1246.10.camel@venus> Message-ID: <20040705133845.GI30146@tykepenguin.com> On Sun, Jul 04, 2004 at 09:27:04PM +0200, Gabriel Wicke wrote: > On Sat, 2004-07-03 at 11:07 +0100, Gareth Bult wrote: > > Hi, > > > > ccsd starts a number of threads .. > > > > pick the one eating the CPU and "strace -p" it .. > > Thanks for this tip, i found some information that might be useful. > There's only one thread created that doesn't show up in ps aux or top, > but it's possible to connect to it by using strace -p pid-of-parent+1. > Output is heaps of lines like this, constantly looping/scrolling: > > socket(PF_BLUETOOTH, SOCK_DGRAM, 3) = -1 ENETDOWN (Network is down) > > So i suspected some weird Bluetooth/GFS interaction. Recompiled the > kernel with Bluetooth support disabled, but same thing. ccs_test seems > to work however, the results returned are correct. I've since double- > checked cluster.xml a few times, that's very likely not the reason > (posted it at http://dl.aulinx.de/gfs/cluster.xml). The bluetooth thing is a red-herring. The cluster socket type clashes with AF_BLUETOOTH and strace knows about the "real" one. we need to register the AF_type properly. CCS seems to poll for the cluster to be ready so it can enable updates - maybe it's just doing that rather too enthusiatically :-) patrick From mailing-lists at hughesjr.com Tue Jul 6 02:59:24 2004 From: mailing-lists at hughesjr.com (Johnny Hughes) Date: Mon, 05 Jul 2004 21:59:24 -0500 Subject: [Linux-cluster] GFS on RedHat ES 3.0 Message-ID: <1089082764.5974.9.camel@Myth.home.local> Richard, Try this: rpmbuild --rebuild --target i686 GFS-6.0.0-1.2.src.rpm Johnny Hughes HughesJR.com -----Original Message----- From: "Richard Mayhew" To: "Discussion of clustering software components including GFS" Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0 Date: Mon, 5 Jul 2004 11:50:44 +0200 >Hi >Thanks for the quick response. > >I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600 >SAN. > >I grabbed the 3 RPMS from >ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ >Should I be downloading them for another source? Is there support yet >for the latest ES 3.0 kernel? > >After rebuilding these RPMS' I only end up with the following. > > >GFS-6.0.0-1.2.i386.rpm >GFS-6.0.0-1.2.src.rpm >GFS-debuginfo-6.0.0-1.2.i386.rpm >GFS-devel-6.0.0-1.2.i386.rpm >GFS-modules-6.0.0-1.2.i386.rpm >perl-Net-Telnet-3.03-2.noarch.rpm >perl-Net-Telnet-3.03-2.src.rpm >rh-gfs-en-6.0-4.noarch.rpm >rh-gfs-en-6.0-4.src.rpm > >This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmayhew at mweb.com Tue Jul 6 07:48:38 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Tue, 6 Jul 2004 09:48:38 +0200 Subject: [Linux-cluster] GFS on RedHat ES 3.0 Message-ID: <91C4F1A7C418014D9F88E938C13554584B2A54@mwjdc2.mweb.com> hi I tried this some time ago and ended up with this.. Installing GFS-6.0.0-1.2.src.rpm Building target platforms: i686 Building for target i686 Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.81824 + umask 022 + cd /usr/src/redhat/BUILD + LANG=C + export LANG + unset DISPLAY + echo ping ping + cd /usr/src/redhat/BUILD + rm -rf GFS-6.0.0 + /bin/mkdir -p GFS-6.0.0 + cd GFS-6.0.0 + /usr/bin/gzip -dc /usr/src/redhat/SOURCES/gfs-build.tar.gz + tar -xf - + STATUS=0 + '[' 0 -ne 0 ']' ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chown -Rhf root . ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chgrp -Rhf root . + /bin/chmod -Rf a+rX,g-w,o-w . + echo pong pong + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.81824 + umask 022 + cd /usr/src/redhat/BUILD + cd GFS-6.0.0 + LANG=C + export LANG + unset DISPLAY ++ pwd + BUILD_TOPDIR=/usr/src/redhat/BUILD/GFS-6.0.0 + BuildSistina i686 hugemem + cpu_type=i686 + flavor=hugemem + kernel_src=/lib/modules/2.4.21-15.ELhugemem/build + '[' -d /lib/modules/2.4.21-15.ELhugemem/build/. ']' + echo 'Kernel not found.' Kernel not found. + ls /lib/modules/2.4.21-15.EL /lib/modules/2.4.21-4.EL /lib/modules/2.4.21-9.0.3.EL /lib/modules/2.4.21-15.EL: build misc modules.generic_string modules.isapnpmap modules.pcimap modules.usbmap kernel modules.dep modules.ieee1394map modules.parportmap modules.pnpbiosmap /lib/modules/2.4.21-4.EL: updates /lib/modules/2.4.21-9.0.3.EL: misc + exit 1 error: Bad exit status from /var/tmp/rpm-tmp.81824 (%build) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.81824 (%build) Any ideas? ________________________________ From: Johnny Hughes [mailto:mailing-lists at hughesjr.com] Sent: 06 July 2004 04:59 AM To: linux-cluster at redhat.com Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0 Richard, Try this: rpmbuild --rebuild --target i686 GFS-6.0.0-1.2.src.rpm Johnny Hughes HughesJR.com -----Original Message----- From: "Richard Mayhew" To: "Discussion of clustering software components including GFS" Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0 Date: Mon, 5 Jul 2004 11:50:44 +0200 >Hi >Thanks for the quick response. > >I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600 >SAN. > >I grabbed the 3 RPMS from >ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ >Should I be downloading them for another source? Is there support yet >for the latest ES 3.0 kernel? > >After rebuilding these RPMS' I only end up with the following. > > >GFS-6.0.0-1.2.i386.rpm >GFS-6.0.0-1.2.src.rpm >GFS-debuginfo-6.0.0-1.2.i386.rpm >GFS-devel-6.0.0-1.2.i386.rpm >GFS-modules-6.0.0-1.2.i386.rpm >perl-Net-Telnet-3.03-2.noarch.rpm >perl-Net-Telnet-3.03-2.src.rpm >rh-gfs-en-6.0-4.noarch.rpm >rh-gfs-en-6.0-4.src.rpm > >This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm From mailing-lists at hughesjr.com Tue Jul 6 12:22:26 2004 From: mailing-lists at hughesjr.com (Johnny Hughes) Date: Tue, 06 Jul 2004 07:22:26 -0500 Subject: [Linux-cluster] GFS on RedHat ES 3.0 In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B2A54@mwjdc2.mweb.com> References: <91C4F1A7C418014D9F88E938C13554584B2A54@mwjdc2.mweb.com> Message-ID: <1089116546.19333.122.camel@Myth.home.local> For building purposes, install the packages kernel, kernel-source, kernel-smp, kernel-hugemem. Then do the --target i686 command. (After you have finished building, you can remove all the kernels except the one you need to boot). Also, if you want to build this on the 2.4.21-15.0.3.EL kernel, you can download a modified source file from me that builds against that kernel (it builds against 15.EL, 15.0.2.EL and 15.0.3.EL). I built the i686 rpms, which you can download from me and try if you want (see the link below). The src.rpm file should build on any kernel where the name is 2.4.21-15.EL, 2.4.21-15.0.2.EL, or 2.4.21-15.0.3.EL. (must have kernel, kernel-source, kernel-smp, kernel-hugemem installed to build). My RPMS were built on a WBEL machine, but it shouldn't make any difference. They will only install on a 2.4.21-15.0.3.EL kernel... GFS Downloads Johnny Hughes HughesJR.com On Tue, 2004-07-06 at 02:48, Richard Mayhew wrote: > hi > I tried this some time ago and ended up with this.. > > > Installing GFS-6.0.0-1.2.src.rpm > Building target platforms: i686 > Building for target i686 > Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.81824 > + umask 022 > + cd /usr/src/redhat/BUILD > + LANG=C > + export LANG > + unset DISPLAY > + echo ping > ping > + cd /usr/src/redhat/BUILD > + rm -rf GFS-6.0.0 > + /bin/mkdir -p GFS-6.0.0 > + cd GFS-6.0.0 > + /usr/bin/gzip -dc /usr/src/redhat/SOURCES/gfs-build.tar.gz > + tar -xf - > + STATUS=0 > + '[' 0 -ne 0 ']' > ++ /usr/bin/id -u > + '[' 0 = 0 ']' > + /bin/chown -Rhf root . > ++ /usr/bin/id -u > + '[' 0 = 0 ']' > + /bin/chgrp -Rhf root . > + /bin/chmod -Rf a+rX,g-w,o-w . > + echo pong > pong > + exit 0 > Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.81824 > + umask 022 > + cd /usr/src/redhat/BUILD > + cd GFS-6.0.0 > + LANG=C > + export LANG > + unset DISPLAY > ++ pwd > + BUILD_TOPDIR=/usr/src/redhat/BUILD/GFS-6.0.0 > + BuildSistina i686 hugemem > + cpu_type=i686 > + flavor=hugemem > + kernel_src=/lib/modules/2.4.21-15.ELhugemem/build > + '[' -d /lib/modules/2.4.21-15.ELhugemem/build/. ']' > + echo 'Kernel not found.' > Kernel not found. > + ls /lib/modules/2.4.21-15.EL /lib/modules/2.4.21-4.EL > /lib/modules/2.4.21-9.0.3.EL > /lib/modules/2.4.21-15.EL: > build misc modules.generic_string modules.isapnpmap > modules.pcimap modules.usbmap > kernel modules.dep modules.ieee1394map modules.parportmap > modules.pnpbiosmap > > /lib/modules/2.4.21-4.EL: > updates > > /lib/modules/2.4.21-9.0.3.EL: > misc > + exit 1 > error: Bad exit status from /var/tmp/rpm-tmp.81824 (%build) > > > RPM build errors: > Bad exit status from /var/tmp/rpm-tmp.81824 (%build) > > > Any ideas? > ________________________________ > > From: Johnny Hughes [mailto:mailing-lists at hughesjr.com] > Sent: 06 July 2004 04:59 AM > To: linux-cluster at redhat.com > Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0 > > > Richard, > Try this: > > rpmbuild --rebuild --target i686 GFS-6.0.0-1.2.src.rpm > > > Johnny Hughes > HughesJR.com > > > -----Original Message----- > From: "Richard Mayhew" > To: "Discussion of clustering software components including GFS" > > Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0 > Date: Mon, 5 Jul 2004 11:50:44 +0200 > > >Hi > >Thanks for the quick response. > > > >I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600 > >SAN. > > > >I grabbed the 3 RPMS from > >ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ > > > >Should I be downloading them for another source? Is there support yet > >for the latest ES 3.0 kernel? > > > >After rebuilding these RPMS' I only end up with the following. > > > > > >GFS-6.0.0-1.2.i386.rpm > >GFS-6.0.0-1.2.src.rpm > >GFS-debuginfo-6.0.0-1.2.i386.rpm > >GFS-devel-6.0.0-1.2.i386.rpm > >GFS-modules-6.0.0-1.2.i386.rpm > >perl-Net-Telnet-3.03-2.noarch.rpm > >perl-Net-Telnet-3.03-2.src.rpm > >rh-gfs-en-6.0-4.noarch.rpm > >rh-gfs-en-6.0-4.src.rpm > > > >This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmayhew at mweb.com Tue Jul 6 12:43:01 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Tue, 6 Jul 2004 14:43:01 +0200 Subject: [Linux-cluster] GFS on RedHat ES 3.0 Message-ID: <91C4F1A7C418014D9F88E938C13554584B2AD6@mwjdc2.mweb.com> Ta very much, let me get stuck in here with your RPM and give it a bash. _____ From: Johnny Hughes [mailto:mailing-lists at hughesjr.com] Sent: 06 July 2004 02:22 PM To: linux-cluster at redhat.com Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0 For building purposes, install the packages kernel, kernel-source, kernel-smp, kernel-hugemem. Then do the --target i686 command. (After you have finished building, you can remove all the kernels except the one you need to boot). Also, if you want to build this on the 2.4.21-15.0.3.EL kernel, you can download a modified source file from me that builds against that kernel (it builds against 15.EL, 15.0.2.EL and 15.0.3.EL). I built the i686 rpms, which you can download from me and try if you want (see the link below). The src.rpm file should build on any kernel where the name is 2.4.21-15.EL, 2.4.21-15.0.2.EL, or 2.4.21-15.0.3.EL. (must have kernel, kernel-source, kernel-smp, kernel-hugemem installed to build). My RPMS were built on a WBEL machine, but it shouldn't make any difference. They will only install on a 2.4.21-15.0.3.EL kernel... GFS Downloads Johnny Hughes HughesJR.com On Tue, 2004-07-06 at 02:48, Richard Mayhew wrote: hi I tried this some time ago and ended up with this.. Installing GFS-6.0.0-1.2.src.rpm Building target platforms: i686 Building for target i686 Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.81824 + umask 022 + cd /usr/src/redhat/BUILD + LANG=C + export LANG + unset DISPLAY + echo ping ping + cd /usr/src/redhat/BUILD + rm -rf GFS-6.0.0 + /bin/mkdir -p GFS-6.0.0 + cd GFS-6.0.0 + /usr/bin/gzip -dc /usr/src/redhat/SOURCES/gfs-build.tar.gz + tar -xf - + STATUS=0 + '[' 0 -ne 0 ']' ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chown -Rhf root . ++ /usr/bin/id -u + '[' 0 = 0 ']' + /bin/chgrp -Rhf root . + /bin/chmod -Rf a+rX,g-w,o-w . + echo pong pong + exit 0 Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.81824 + umask 022 + cd /usr/src/redhat/BUILD + cd GFS-6.0.0 + LANG=C + export LANG + unset DISPLAY ++ pwd + BUILD_TOPDIR=/usr/src/redhat/BUILD/GFS-6.0.0 + BuildSistina i686 hugemem + cpu_type=i686 + flavor=hugemem + kernel_src=/lib/modules/2.4.21-15.ELhugemem/build + '[' -d /lib/modules/2.4.21-15.ELhugemem/build/. ']' + echo 'Kernel not found.' Kernel not found. + ls /lib/modules/2.4.21-15.EL /lib/modules/2.4.21-4.EL /lib/modules/2.4.21-9.0.3.EL /lib/modules/2.4.21-15.EL: build misc modules.generic_string modules.isapnpmap modules.pcimap modules.usbmap kernel modules.dep modules.ieee1394map modules.parportmap modules.pnpbiosmap /lib/modules/2.4.21-4.EL: updates /lib/modules/2.4.21-9.0.3.EL: misc + exit 1 error: Bad exit status from /var/tmp/rpm-tmp.81824 (%build) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.81824 (%build) Any ideas? ________________________________ From: Johnny Hughes [mailto:mailing-lists at hughesjr.com] Sent: 06 July 2004 04:59 AM To: linux-cluster at redhat.com Subject: Re: [Linux-cluster] GFS on RedHat ES 3.0 Richard, Try this: rpmbuild --rebuild --target i686 GFS-6.0.0-1.2.src.rpm Johnny Hughes HughesJR.com > -----Original Message----- From: "Richard Mayhew" To: "Discussion of clustering software components including GFS" Subject: RE: [Linux-cluster] GFS on RedHat ES 3.0 Date: Mon, 5 Jul 2004 11:50:44 +0200 >Hi >Thanks for the quick response. > >I am running on Dell 1750's (Dual P4 2.4Ghz, 2GB Ram) using a EMC CX600 >SAN. > >I grabbed the 3 RPMS from >ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ > >Should I be downloading them for another source? Is there support yet >for the latest ES 3.0 kernel? > >After rebuilding these RPMS' I only end up with the following. > > >GFS-6.0.0-1.2.i386.rpm >GFS-6.0.0-1.2.src.rpm >GFS-debuginfo-6.0.0-1.2.i386.rpm >GFS-devel-6.0.0-1.2.i386.rpm >GFS-modules-6.0.0-1.2.i386.rpm >perl-Net-Telnet-3.03-2.noarch.rpm >perl-Net-Telnet-3.03-2.src.rpm >rh-gfs-en-6.0-4.noarch.rpm >rh-gfs-en-6.0-4.src.rpm > >This was done with a rpmbuild --rebuild GFS-6.0.0-1.2.src.rpm -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbrassow at redhat.com Tue Jul 6 14:24:59 2004 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Tue, 6 Jul 2004 09:24:59 -0500 Subject: [Linux-cluster] Re: ccsd hanging after start on debian unstable In-Reply-To: <20040703160112.7160674067@hormel.redhat.com> References: <20040703160112.7160674067@hormel.redhat.com> Message-ID: <42BA51AC-CF58-11D8-A8E3-000A957BB1F6@redhat.com> This has to do with 'exit' not being able to be called from interrupt context on some systems. This problem should be fixed in cvs. Additionally, for those receiving tons of messages about a network not being found... this is due to ccsd trying to communicate with cman before it is ready. These messages should no longer be printed. brassow On Jul 3, 2004, at 11:01 AM, linux-cluster-request at redhat.com wrote: > ccsd hanging after start on debian unstable From wendland at scan-plus.de Tue Jul 6 18:46:53 2004 From: wendland at scan-plus.de (Joerg Wendland) Date: Tue, 6 Jul 2004 20:46:53 +0200 Subject: [Linux-cluster] Kernel panic in fs/gfs_locking/lock_dlm/lock.c In-Reply-To: <20040701034859.GC11996@redhat.com> References: <20040701010048.GC25028@dozer> <20040701034859.GC11996@redhat.com> Message-ID: <20040706184653.GA16139@dozer> On Thu, Jul 01, 2004 at 11:48:59AM +0800, David Teigland wrote: > > Kernel panic: lock_dlm: Record message above and reboot. > > This is a bug we know of and are working on right now. Is this fixed by the latest CVS checkins? The log messages suppose so. Thanks, Joerg -- | Entwickler Elektronische Datenverarbeitung und Dienstbetriebsmittel | | Scan-Plus GmbH Dienstbetriebsmittelherstellung fon +49-731-92013-0 | | Koenigstrasse 78, 89077 Ulm, Germany fax +49-731-92013-290 | | Geschaeftsfuehrer: Juergen Hoermann HRB 3220 Amtsgericht Ulm | | PGP-key: 51CF8417 (FP: 79C0 7671 AFC7 315E 657A F318 57A3 7FBD 51CF 8417) | From teigland at redhat.com Wed Jul 7 02:43:51 2004 From: teigland at redhat.com (David Teigland) Date: Wed, 7 Jul 2004 10:43:51 +0800 Subject: [Linux-cluster] Kernel panic in fs/gfs_locking/lock_dlm/lock.c In-Reply-To: <20040706184653.GA16139@dozer> References: <20040701010048.GC25028@dozer> <20040701034859.GC11996@redhat.com> <20040706184653.GA16139@dozer> Message-ID: <20040707024351.GA7674@redhat.com> On Tue, Jul 06, 2004 at 08:46:53PM +0200, Joerg Wendland wrote: > On Thu, Jul 01, 2004 at 11:48:59AM +0800, David Teigland wrote: > > > Kernel panic: lock_dlm: Record message above and reboot. > > > > This is a bug we know of and are working on right now. > > Is this fixed by the latest CVS checkins? The log messages suppose so. We've fixed a couple things, so it's possible, but I suspect you may just trigger an assertion earlier now based on the debugging we're still doing. -- Dave Teigland From rmayhew at mweb.com Wed Jul 7 10:23:41 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Wed, 7 Jul 2004 12:23:41 +0200 Subject: [Linux-cluster] Gfs_data vs gfs_journal Message-ID: <91C4F1A7C418014D9F88E938C13554584B2BF6@mwjdc2.mweb.com> Hi, Could some one explain or point me in the right direction in the differences between gfs_data and gfs_journal in the pool config file. Which is the better option, and why? Thanks Richard. From mtilstra at redhat.com Wed Jul 7 14:39:08 2004 From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra) Date: Wed, 7 Jul 2004 09:39:08 -0500 Subject: [Linux-cluster] Gfs_data vs gfs_journal In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B2BF6@mwjdc2.mweb.com> References: <91C4F1A7C418014D9F88E938C13554584B2BF6@mwjdc2.mweb.com> Message-ID: <20040707143908.GA6496@redhat.com> On Wed, Jul 07, 2004 at 12:23:41PM +0200, Richard Mayhew wrote: > Could some one explain or point me in the right direction in the > differences between gfs_data and gfs_journal in the pool config file. > > Which is the better option, and why? For nearly everyone, just use gfs_data. gfs_journal is for controling where physically gfs puts the journals. (file system data goes in gfs_data, journals in gfs_journal.) The idea when we originally made this was that someon might have a really fast but small storage device, and then a bunch of more common storage. They then could use pool to combine the two devices into a single pool, putting the journal onto the faster device. Which then, in theory, would make gfs faster. I don't think it has ever been tested though. You have to tell mkfs.gfs to look at pool lables to use this. (I forget the cmd option, its in the man page.) If you don't tell mkfs.gfs to look at pool labels, the labels are ignored. And gfs puts the journals in the middle of the data section I think. (could be wrong on that.) Hope that helps. -- Michael Conrad Tadpol Tilstra I hate when they fix a bug that I use. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From ben.m.cahill at intel.com Wed Jul 7 15:06:37 2004 From: ben.m.cahill at intel.com (Cahill, Ben M) Date: Wed, 7 Jul 2004 08:06:37 -0700 Subject: [Linux-cluster] Gfs_data vs gfs_journal Message-ID: <0604335B7764D141945E20215310596002299EE5@orsmsx404.amr.corp.intel.com> They are both necessary. gfs_data is the device/partition for filesystem data (i.e. files and on-disk metadata). Each node in the cluster also needs a separate journal device/partition in which to redundantly record metadata, to enable the filesystem to recover gracefully from node failure/crash. There's some documentation about this in the OpenGFS project: opengfs.sourceforge.net/docs.php CAUTION: OpenGFS is *not* the same as current RedHat GFS; many things (e.g. lock protocols) are different ... but the basic idea is the same. See WHATIS-OpenGFS, and HOWTO-generic, just to see if they help you understand. But remember to rely on current RedHat GFS docs for current installation, components, and capabilities info. -- Ben -- Opinions are mine, not Intel's > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Richard Mayhew > Sent: Wednesday, July 07, 2004 6:24 AM > To: Discussion of clustering software components including GFS > Subject: [Linux-cluster] Gfs_data vs gfs_journal > > Hi, > > Could some one explain or point me in the right direction in the > differences between gfs_data and gfs_journal in the pool config file. > > Which is the better option, and why? > > Thanks > Richard. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > > From ben.m.cahill at intel.com Wed Jul 7 15:16:59 2004 From: ben.m.cahill at intel.com (Cahill, Ben M) Date: Wed, 7 Jul 2004 08:16:59 -0700 Subject: [Linux-cluster] Gfs_data vs gfs_journal Message-ID: <0604335B7764D141945E20215310596002299EE8@orsmsx404.amr.corp.intel.com> Oops, based on Michael's response, I realized that mine might be not quite right. Both data and journal *space* are necessary, but the journals can be created, by default, within the filesystem device, with no need for gfs_journal entry in config file. BTW, OpenGFS has supported external journals for over a year at this point ... would this be a useful feature for GFS? -- Ben -- Opinions are mine, not Intel's > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cahill, Ben M > Sent: Wednesday, July 07, 2004 11:07 AM > To: linux-cluster at redhat.com > Subject: RE: [Linux-cluster] Gfs_data vs gfs_journal > > They are both necessary. > > gfs_data is the device/partition for filesystem data (i.e. files and > on-disk metadata). > > Each node in the cluster also needs a separate journal > device/partition > in which to redundantly record metadata, to enable the filesystem to > recover gracefully from node failure/crash. > > There's some documentation about this in the OpenGFS project: > > opengfs.sourceforge.net/docs.php > > CAUTION: OpenGFS is *not* the same as current RedHat GFS; many things > (e.g. lock protocols) are different ... but the basic idea is > the same. > See WHATIS-OpenGFS, and HOWTO-generic, just to see if they help you > understand. But remember to rely on current RedHat GFS docs > for current > installation, components, and capabilities info. > > -- Ben -- > > Opinions are mine, not Intel's > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > Richard Mayhew > > Sent: Wednesday, July 07, 2004 6:24 AM > > To: Discussion of clustering software components including GFS > > Subject: [Linux-cluster] Gfs_data vs gfs_journal > > > > Hi, > > > > Could some one explain or point me in the right direction in the > > differences between gfs_data and gfs_journal in the pool > config file. > > > > Which is the better option, and why? > > > > Thanks > > Richard. > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > http://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > > From bruno.coudoin at storagency.com Wed Jul 7 15:47:54 2004 From: bruno.coudoin at storagency.com (Bruno Coudoin) Date: Wed, 07 Jul 2004 17:47:54 +0200 Subject: [Linux-cluster] building gfs on opteron Message-ID: <1089215274.18997.123.camel@bruno.storagency> I would like to build GFS on opterons with kernel 2.4.21. I managed to compile it on X86 using the GFS-6.0.0-1.2.src.rpm and the binary redhat kernels. Is this process appropriate for opterons as well? Bruno. From nygaard at redhat.com Wed Jul 7 16:21:26 2004 From: nygaard at redhat.com (Erling Nygaard) Date: Wed, 7 Jul 2004 11:21:26 -0500 Subject: [Linux-cluster] Gfs_data vs gfs_journal In-Reply-To: <0604335B7764D141945E20215310596002299EE8@orsmsx404.amr.corp.intel.com>; from ben.m.cahill@intel.com on Wed, Jul 07, 2004 at 08:16:59AM -0700 References: <0604335B7764D141945E20215310596002299EE8@orsmsx404.amr.corp.intel.com> Message-ID: <20040707112126.B30098@homer.msp.redhat.com> Ben You can indeed have external journals with GFS. As Mike was saying, you can specify a subpool with type "gfs_journal". And since you easily can specify what device the subpool is on you decide where the journal is. This feature has been in GFS since 'a looong time ago' and unless there have been changes to this in OpenGFS this feature works in the same way in all versions of GFS :) As Mike pointed out, this was originally done in case of Solid State Disks, where having the journals on the SSD could prove speedup. Due to lack of SSDs this has never really been tested much... Erling On Wed, Jul 07, 2004 at 08:16:59AM -0700, Cahill, Ben M wrote: > Oops, based on Michael's response, I realized that mine might be not > quite right. Both data and journal *space* are necessary, but the > journals can be created, by default, within the filesystem device, with > no need for gfs_journal entry in config file. > > BTW, OpenGFS has supported external journals for over a year at this > point ... would this be a useful feature for GFS? > > -- Ben -- > > Opinions are mine, not Intel's > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Cahill, Ben M > > Sent: Wednesday, July 07, 2004 11:07 AM > > To: linux-cluster at redhat.com > > Subject: RE: [Linux-cluster] Gfs_data vs gfs_journal > > > > They are both necessary. > > > > gfs_data is the device/partition for filesystem data (i.e. files and > > on-disk metadata). > > > > Each node in the cluster also needs a separate journal > > device/partition > > in which to redundantly record metadata, to enable the filesystem to > > recover gracefully from node failure/crash. > > > > There's some documentation about this in the OpenGFS project: > > > > opengfs.sourceforge.net/docs.php > > > > CAUTION: OpenGFS is *not* the same as current RedHat GFS; many things > > (e.g. lock protocols) are different ... but the basic idea is > > the same. > > See WHATIS-OpenGFS, and HOWTO-generic, just to see if they help you > > understand. But remember to rely on current RedHat GFS docs > > for current > > installation, components, and capabilities info. > > > > -- Ben -- > > > > Opinions are mine, not Intel's > > > > > -----Original Message----- > > > From: linux-cluster-bounces at redhat.com > > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > > Richard Mayhew > > > Sent: Wednesday, July 07, 2004 6:24 AM > > > To: Discussion of clustering software components including GFS > > > Subject: [Linux-cluster] Gfs_data vs gfs_journal > > > > > > Hi, > > > > > > Could some one explain or point me in the right direction in the > > > differences between gfs_data and gfs_journal in the pool > > config file. > > > > > > Which is the better option, and why? > > > > > > Thanks > > > Richard. > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > http://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > http://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Erling Nygaard nygaard at redhat.com Red Hat Inc From Remi.Nivet at atosorigin.com Wed Jul 7 17:26:12 2004 From: Remi.Nivet at atosorigin.com (=?iso-8859-1?Q?Nivet_R=E9mi?=) Date: Wed, 7 Jul 2004 19:26:12 +0200 Subject: [Linux-cluster] GFS data access failover Message-ID: <1AD1E96E6289744CB6979EC49DA554C4130F79@srv-grp-s08.dev.atos.fr> Hi everyone, I successfully set up a 2-nodes cluster after three days of hard work looking for docs and crashing servers, but it finaly worked and I'm able to export GFS partition from one node to the other using gnbd ;-) Now my question is : is there any way to use 2 dataservers (maybe replicate data between them but I can manage that on myself) and use a failover (or round-robin) mechanism so that if one of the dataserver crash, my clients can still access the data from the other dataserver ? As an optional question : I'm trying to use clvmd to propagate lvm config from one node to the other, but when I try to create LV, I've got the following error : Error locking on node XXXX: Internal lvm error, check syslog Failed to activate new LV. and the only log I have on the remote node is : lvm[721]: Volume group for uuid not found: LWuaVGYKeELfrxE16CgW0pUOAilU4CSNmLXVJ3b7i5AsjhWgfszorCRkn5KRCQTU anyone knows what I'm missing ? Thanks, R?mi. From ben.m.cahill at intel.com Wed Jul 7 21:01:16 2004 From: ben.m.cahill at intel.com (Cahill, Ben M) Date: Wed, 7 Jul 2004 14:01:16 -0700 Subject: [Linux-cluster] Gfs_data vs gfs_journal Message-ID: <0604335B7764D141945E20215310596002299EE9@orsmsx404.amr.corp.intel.com> Yes, you're correct ... however, the idea in OpenGFS was to not rely on pool, but to allow other "generic" volume managers to be used instead. -- Ben -- Opinions are mine, not Intel's > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Erling Nygaard > Sent: Wednesday, July 07, 2004 12:21 PM > To: Discussion of clustering software components including GFS > Subject: Re: [Linux-cluster] Gfs_data vs gfs_journal > > Ben > > You can indeed have external journals with GFS. > > As Mike was saying, you can specify a subpool with type "gfs_journal". > And since you easily can specify what device the subpool is > on you decide > where the journal is. > > This feature has been in GFS since 'a looong time ago' and > unless there > have been changes to this in OpenGFS this feature works in > the same way in > all versions of GFS :) > > As Mike pointed out, this was originally done in case of Solid State > Disks, where having the journals on the SSD could prove > speedup. Due to > lack of SSDs this has never really been tested much... > > > Erling > > > > On Wed, Jul 07, 2004 at 08:16:59AM -0700, Cahill, Ben M wrote: > > Oops, based on Michael's response, I realized that mine might be not > > quite right. Both data and journal *space* are necessary, but the > > journals can be created, by default, within the filesystem > device, with > > no need for gfs_journal entry in config file. > > > > BTW, OpenGFS has supported external journals for over a year at this > > point ... would this be a useful feature for GFS? > > > > -- Ben -- > > > > Opinions are mine, not Intel's > > > > > -----Original Message----- > > > From: linux-cluster-bounces at redhat.com > > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > Cahill, Ben M > > > Sent: Wednesday, July 07, 2004 11:07 AM > > > To: linux-cluster at redhat.com > > > Subject: RE: [Linux-cluster] Gfs_data vs gfs_journal > > > > > > They are both necessary. > > > > > > gfs_data is the device/partition for filesystem data > (i.e. files and > > > on-disk metadata). > > > > > > Each node in the cluster also needs a separate journal > > > device/partition > > > in which to redundantly record metadata, to enable the > filesystem to > > > recover gracefully from node failure/crash. > > > > > > There's some documentation about this in the OpenGFS project: > > > > > > opengfs.sourceforge.net/docs.php > > > > > > CAUTION: OpenGFS is *not* the same as current RedHat > GFS; many things > > > (e.g. lock protocols) are different ... but the basic idea is > > > the same. > > > See WHATIS-OpenGFS, and HOWTO-generic, just to see if > they help you > > > understand. But remember to rely on current RedHat GFS docs > > > for current > > > installation, components, and capabilities info. > > > > > > -- Ben -- > > > > > > Opinions are mine, not Intel's > > > > > > > -----Original Message----- > > > > From: linux-cluster-bounces at redhat.com > > > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > > > Richard Mayhew > > > > Sent: Wednesday, July 07, 2004 6:24 AM > > > > To: Discussion of clustering software components including GFS > > > > Subject: [Linux-cluster] Gfs_data vs gfs_journal > > > > > > > > Hi, > > > > > > > > Could some one explain or point me in the right direction in the > > > > differences between gfs_data and gfs_journal in the pool > > > config file. > > > > > > > > Which is the better option, and why? > > > > > > > > Thanks > > > > Richard. > > > > > > > > -- > > > > Linux-cluster mailing list > > > > Linux-cluster at redhat.com > > > > http://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > http://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > http://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Erling Nygaard > nygaard at redhat.com > > Red Hat Inc > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > > From notiggy at gmail.com Wed Jul 7 22:24:08 2004 From: notiggy at gmail.com (Brian Jackson) Date: Wed, 7 Jul 2004 17:24:08 -0500 Subject: [Linux-cluster] GFS data access failover In-Reply-To: <1AD1E96E6289744CB6979EC49DA554C4130F79@srv-grp-s08.dev.atos.fr> References: <1AD1E96E6289744CB6979EC49DA554C4130F79@srv-grp-s08.dev.atos.fr> Message-ID: On Wed, 7 Jul 2004 19:26:12 +0200, Nivet R?mi wrote: > Hi everyone, > > I successfully set up a 2-nodes cluster after three days of hard work looking for docs and crashing servers, but it finaly worked and I'm able to export GFS partition from one node to the other using gnbd ;-) > > Now my question is : is there any way to use 2 dataservers (maybe replicate data between them but I can manage that on myself) and use a failover (or round-robin) mechanism so that if one of the dataserver crash, my clients can still access the data from the other dataserver ? > Nope, this is frequently asked. You need some sort of cluster aware mirroring. The kernels' md drivers aren't even close. I believe some people are currently looking at/working on this --Brian > As an optional question : I'm trying to use clvmd to propagate lvm config from one node to the other, but when I try to create LV, I've got the following error : > > Error locking on node XXXX: Internal lvm error, check syslog > Failed to activate new LV. > > and the only log I have on the remote node is : > > lvm[721]: Volume group for uuid not found: LWuaVGYKeELfrxE16CgW0pUOAilU4CSNmLXVJ3b7i5AsjhWgfszorCRkn5KRCQTU > > anyone knows what I'm missing ? > > Thanks, > R?mi. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > From rmayhew at mweb.com Thu Jul 8 12:27:38 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Thu, 8 Jul 2004 14:27:38 +0200 Subject: [Linux-cluster] GFS Performance Message-ID: <91C4F1A7C418014D9F88E938C13554584B2D5C@mwjdc2.mweb.com> Hi I setup 2 nodes, on my EMC SAN. Both nodes see the storage and can access the cca device. When writing a file to the storage fs, the second node takes a couple of seconds to see the changes. Ie. 1. Node 1 Creates the file "dd if=/dev/zero of=test.file bs=4096 count=10240000" 2. Doing a ls -la on node 2 takes a few seconds to display the contents of the dir. After the file has finished being updates, all listings of that dir are quick, but if any changes are made, one again has to wait for the system to display the contents of the dir. Any idea? -- Regards Richard Mayhew Unix Specialist MWEB Business Tel: + 27 11 340 7200 Fax: + 27 11 340 7288 Website: www.mwebbusiness.co.za From jeff at intersystems.com Thu Jul 8 13:09:49 2004 From: jeff at intersystems.com (Jeff) Date: Thu, 8 Jul 2004 09:09:49 -0400 Subject: [Linux-cluster] Some GDLM questions In-Reply-To: <20040705093135.GB30146@tykepenguin.com> References: <104121513.20040703103356@intersystems.com> <20040705093135.GB30146@tykepenguin.com> Message-ID: <14710269533.20040708090949@intersystems.com> Monday, July 5, 2004, 5:31:35 AM, Patrick Caulfield wrote: > On Sat, Jul 03, 2004 at 10:33:56AM -0400, Jeff wrote: >> These are from reviewing http://people.redhat.com/~teigland/sca.pdf >> and the CVS copy of cluster/dlm/doc/libdlm.txt. >> ------------------------------------------------------------------ >> >> If a program requests a lock on the AST side can it wait for >> the lock to complete without returning from the original AST >> routine? Would it use the poll/select mechanism to do this? > In kernel space you shouldn't wait or do much work in the AST routine or > you can block the kernel's AST delivery thread. You can call dlm_lock() in an > AST routine though. > In userspace you can do pretty much what you like in the AST routine as (by > default) they run in a seperate thread - see libdlm for more details on this. Thanks for the answers. One more question about acquiring locks in the worker thread. Assuming I'm using pthreads if I want to call dlm_lock() in the worker thread (AST routine) and I need to wait for that dlm_lock() call to complete would I call dlm_get_fd() and loop calling dlm_dispatch() until I see that the lock completes? This involves dlm_dispatch() calling itself recursively which I assume is going to be ok. Can you call dlm_pthread_init() more than once to start multiple service threads? I have some routines which are called both by the mainline code and as an AST routine. I'm wondering if I need them to be aware of how they're called or whether they can always simply open the fd and use dlm_dispatch if they need to issue a 'blocking' dlm_lock() call. From pcaulfie at redhat.com Thu Jul 8 13:38:02 2004 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 8 Jul 2004 14:38:02 +0100 Subject: [Linux-cluster] Some GDLM questions In-Reply-To: <14710269533.20040708090949@intersystems.com> References: <104121513.20040703103356@intersystems.com> <20040705093135.GB30146@tykepenguin.com> <14710269533.20040708090949@intersystems.com> Message-ID: <20040708133800.GF7680@tykepenguin.com> On Thu, Jul 08, 2004 at 09:09:49AM -0400, Jeff wrote: > Monday, July 5, 2004, 5:31:35 AM, Patrick Caulfield wrote: > > > On Sat, Jul 03, 2004 at 10:33:56AM -0400, Jeff wrote: > >> These are from reviewing http://people.redhat.com/~teigland/sca.pdf > >> and the CVS copy of cluster/dlm/doc/libdlm.txt. > >> ------------------------------------------------------------------ > >> > >> If a program requests a lock on the AST side can it wait for > >> the lock to complete without returning from the original AST > >> routine? Would it use the poll/select mechanism to do this? > > > In kernel space you shouldn't wait or do much work in the AST routine or > > you can block the kernel's AST delivery thread. You can call dlm_lock() in an > > AST routine though. > > > In userspace you can do pretty much what you like in the AST routine as (by > > default) they run in a seperate thread - see libdlm for more details on this. > > Thanks for the answers. One more question about acquiring locks in > the worker thread. > > Assuming I'm using pthreads if I want to call dlm_lock() in the worker > thread (AST routine) and I need to wait for that dlm_lock() call to complete > would I call dlm_get_fd() and loop calling dlm_dispatch() until I see that > the lock completes? This involves dlm_dispatch() calling itself > recursively which I assume is going to be ok. That should work fine, though I must admit I haven't tried it. The routines are all re-entrant (of course). > Can you call dlm_pthread_init() more than once to start multiple > service threads? I have some routines which are called > both by the mainline code and as an AST routine. I'm wondering if I > need them to be aware of how they're called or whether they can > always simply open the fd and use dlm_dispatch if they need to > issue a 'blocking' dlm_lock() call. Currently you can't have multiple dispatch routines by using library's pthread_init calls. The threads don't really care whether they are AST threads or work threads - all they do is read data from the DLM's fd and call the routine specified by astaddr. I suppose one hazard of issuing more lock requests in the AST routine is that you will need to keep a track of which lock requests you have had ASTs for - in cast mainline issues any more lock requests in the meantime. You will have to make sure that they get dispatched as well as your nested one. If course this will happen in dlm_dispatch() but you don't know whose lock has been dispatched each time. Things might be a little clearer if you had a look inside libdlm.c - it's actually quite as simple little library. One warning...try to avoid calling the kernel bits yourself; I don't want to change the userland/kernel API but if it becomes necessary the library will always be modified to cope. -- patrick From madmax at iskon.hr Thu Jul 8 14:47:24 2004 From: madmax at iskon.hr (Kresimir Kukulj) Date: Thu, 8 Jul 2004 16:47:24 +0200 Subject: [Linux-cluster] GNBD, how good it is ? Message-ID: <20040708144724.GB18751@max.zg.iskon.hr> What is the difference/development status of RedHat's (sistina) GNBD compared to OpenGFS GNBD ? Which one is more stable ? I see on sourceforge project page that OpenGFS GNBD was not updated since 2002. I also found NBD, ENBD, DRBD but these don't support client nodes to be mounted (even read-only) if master node is using the device. Is there any other technology (software) that can export a block device from 1 master to couple of slave nodes ? Read only access on client nodes is good enough. Is anyone using some kind of network block device in production, and with what success ? Thanks. -- Kresimir Kukulj madmax at iskon.hr +--------------------------------------------------+ Old PC's never die. They just become Unix terminals. From notiggy at gmail.com Thu Jul 8 18:46:01 2004 From: notiggy at gmail.com (Brian Jackson) Date: Thu, 8 Jul 2004 13:46:01 -0500 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <20040708144724.GB18751@max.zg.iskon.hr> References: <20040708144724.GB18751@max.zg.iskon.hr> Message-ID: On Thu, 8 Jul 2004 16:47:24 +0200, Kresimir Kukulj wrote: > > What is the difference/development status of RedHat's (sistina) GNBD > compared to OpenGFS GNBD ? Which one is more stable ? I see on sourceforge > project page that OpenGFS GNBD was not updated since 2002. Once we (OpenGFS) found out that there were other alternatives, we thought we had more important things to do than to maintain gnbd (with our limited resources). It was deprecated some time ago. GFS's code however has been maintained, and updated to newer kernels, etc. GFS's is by far a better choice as far as those two are concerned. > > I also found NBD, ENBD, DRBD but these don't support client nodes to be > mounted (even read-only) if master node is using the device. > > Is there any other technology (software) that can export a block device from > 1 master to couple of slave nodes ? Read only access on client nodes is good > enough. iSCSI and HyperSCSI both work with GFS, so those are options. I suppose you'd be better off answering the question of whether they are stable enough for you. --Brian Jackson > > Is anyone using some kind of network block device in production, and with > what success ? > > Thanks. > > -- > Kresimir Kukulj madmax at iskon.hr From Gareth at Bult.co.uk Thu Jul 8 20:02:15 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Thu, 08 Jul 2004 21:02:15 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: References: <20040708144724.GB18751@max.zg.iskon.hr> Message-ID: <1089316935.6121.7.camel@squizzey> Hi, Just a general comment; I spent some time getting a cluster running, in summary - for anyone interested (IMHO); a. x86 boxes cluster relatively easily once you get hold of the right docs / examples b. amd64 boxes will not cluster with x86 c. the cluster can crash relatively easily for a number of reasons d. ccsd can be a real CPU hog, esp when waiting to connect e. after a potentially silent gfs kernel crash, there's a real nice bug that leaves your CPU floating at expected levels yet the load average is up at 15+. Summary (IMHO); a. Performance is good and it does work b. It looks promising, yet still alpha/beta c. Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability d. Documentation is severely lacking ... I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's not quite there yet .. Regards, Gareth. On Thu, 2004-07-08 at 13:46 -0500, Brian Jackson wrote: > On Thu, 8 Jul 2004 16:47:24 +0200, Kresimir Kukulj wrote: > > > > What is the difference/development status of RedHat's (sistina) GNBD > > compared to OpenGFS GNBD ? Which one is more stable ? I see on sourceforge > > project page that OpenGFS GNBD was not updated since 2002. > > Once we (OpenGFS) found out that there were other alternatives, we > thought we had more important things to do than to maintain gnbd (with > our limited resources). It was deprecated some time ago. GFS's code > however has been maintained, and updated to newer kernels, etc. GFS's > is by far a better choice as far as those two are concerned. > > > > > I also found NBD, ENBD, DRBD but these don't support client nodes to be > > mounted (even read-only) if master node is using the device. > > > > Is there any other technology (software) that can export a block device from > > 1 master to couple of slave nodes ? Read only access on client nodes is good > > enough. > > iSCSI and HyperSCSI both work with GFS, so those are options. I > suppose you'd be better off answering the question of whether they are > stable enough for you. > > --Brian Jackson > > > > > Is anyone using some kind of network block device in production, and with > > what success ? > > > > Thanks. > > > > -- > > Kresimir Kukulj madmax at iskon.hr > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From notiggy at gmail.com Thu Jul 8 21:26:49 2004 From: notiggy at gmail.com (Brian Jackson) Date: Thu, 8 Jul 2004 16:26:49 -0500 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089316935.6121.7.camel@squizzey> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> Message-ID: > Hi, > > Just a general comment; > > I spent some time getting a cluster running, in summary - for anyone interested (IMHO); > Summary (IMHO); > a. Performance is good and it does work > > b. It looks promising, yet still alpha/beta I'm sure I read somewhere, that the current code is considered beta. If I'm making that up, It's implied by the fact that the DLM is quite new, and the port to 2.6 is also pretty fresh. > > c. Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability It is if you've got shared storage that implements raid. > > d. Documentation is severely lacking ... Agreed, but that's why there's a wiki, mailing list, and irc channel. > > I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's > not quite there yet .. I'm sure Red Hat's product for RHEL is nice, stable, and production ready. You are playing with relatively fresh code. --Brian > > Regards, > > Gareth. From kpfleming at backtobasicsmgmt.com Fri Jul 9 00:47:34 2004 From: kpfleming at backtobasicsmgmt.com (Kevin P. Fleming) Date: Thu, 08 Jul 2004 17:47:34 -0700 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> Message-ID: <40EDEB26.8020209@backtobasicsmgmt.com> Brian Jackson wrote: >>c. Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability > > > It is if you've got shared storage that implements raid. Not if you are trying to avoid single points of failure, unless you have a fully redundant meshed fabric SAN, which most of us cannot afford :-) From kpfleming at backtobasicsmgmt.com Fri Jul 9 02:58:39 2004 From: kpfleming at backtobasicsmgmt.com (Kevin P. Fleming) Date: Thu, 08 Jul 2004 19:58:39 -0700 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: References: <20040708144724.GB18751@max.zg.iskon.hr> Message-ID: <40EE09DF.7010909@backtobasicsmgmt.com> Brian Jackson wrote: > iSCSI and HyperSCSI both work with GFS, so those are options. I > suppose you'd be better off answering the question of whether they are > stable enough for you. Speaking of iSCSI, is anyone aware of a GPL Linux 2.6 iSCSI target? From wim.coekaerts at oracle.com Fri Jul 9 03:04:54 2004 From: wim.coekaerts at oracle.com (Wim Coekaerts) Date: Thu, 8 Jul 2004 20:04:54 -0700 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <40EE09DF.7010909@backtobasicsmgmt.com> References: <20040708144724.GB18751@max.zg.iskon.hr> <40EE09DF.7010909@backtobasicsmgmt.com> Message-ID: <20040709030453.GA13641@ca-server1.us.oracle.com> http://unh-iscsi.sourceforge.net/ On Thu, Jul 08, 2004 at 07:58:39PM -0700, Kevin P. Fleming wrote: > Brian Jackson wrote: > > >iSCSI and HyperSCSI both work with GFS, so those are options. I > >suppose you'd be better off answering the question of whether they are > >stable enough for you. > > Speaking of iSCSI, is anyone aware of a GPL Linux 2.6 iSCSI target? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From RAWIPFEL at novell.com Fri Jul 9 03:07:15 2004 From: RAWIPFEL at novell.com (Robert Wipfel) Date: Thu, 08 Jul 2004 21:07:15 -0600 Subject: [Linux-cluster] GNBD, how good it is ? Message-ID: http://www.ardistech.com/iscsi/ >>> wim.coekaerts at oracle.com 7/8/2004 9:04:54 PM >>> http://unh-iscsi.sourceforge.net/ On Thu, Jul 08, 2004 at 07:58:39PM -0700, Kevin P. Fleming wrote: > Brian Jackson wrote: > > >iSCSI and HyperSCSI both work with GFS, so those are options. I > >suppose you'd be better off answering the question of whether they are > >stable enough for you. > > Speaking of iSCSI, is anyone aware of a GPL Linux 2.6 iSCSI target? > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com http://www.redhat.com/mailman/listinfo/linux-cluster From kpfleming at backtobasicsmgmt.com Fri Jul 9 03:16:08 2004 From: kpfleming at backtobasicsmgmt.com (Kevin P. Fleming) Date: Thu, 08 Jul 2004 20:16:08 -0700 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: References: Message-ID: <40EE0DF8.7020303@backtobasicsmgmt.com> Robert Wipfel wrote: > http://www.ardistech.com/iscsi/ This page says "requires a recent 2.4 kernel". > http://unh-iscsi.sourceforge.net/ This page warns that their target implementation was only created to test the initiator with, and is not intended for production use. Doesn't mean it doesn't work, but that does not give me a good feeling about relying on it :-) From Gareth at Bult.co.uk Fri Jul 9 14:53:00 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Fri, 09 Jul 2004 15:53:00 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> Message-ID: <1089384780.6120.35.camel@squizzey> :) I do appreciate all that, however there are some press releases out there that are not so clear .. There is certainly an implication in the news items I've seen that this is "THE GFS" code .. as opposed to being a new and unstable version .. .. Incidentally, I was being kind - I've had many kernel crashes, even after getting it going .. Regards, Gareth. On Thu, 2004-07-08 at 16:26 -0500, Brian Jackson wrote: > > Hi, > > > > Just a general comment; > > > > I spent some time getting a cluster running, in summary - for anyone interested (IMHO); > > > Summary (IMHO); > > > a. Performance is good and it does work > > > > b. It looks promising, yet still alpha/beta > > I'm sure I read somewhere, that the current code is considered beta. > If I'm making that up, It's implied by the fact that the DLM is quite > new, and the port to 2.6 is also pretty fresh. > > > > > c. Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability > > It is if you've got shared storage that implements raid. > > > > > d. Documentation is severely lacking ... > > Agreed, but that's why there's a wiki, mailing list, and irc channel. > > > > > I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's > not quite there yet .. > > I'm sure Red Hat's product for RHEL is nice, stable, and production > ready. You are playing with relatively fresh code. > > --Brian > > > > > Regards, > > > > Gareth. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smiley-3.png Type: image/png Size: 819 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From bmarzins at redhat.com Fri Jul 9 16:21:43 2004 From: bmarzins at redhat.com (Benjamin Marzinski) Date: Fri, 9 Jul 2004 11:21:43 -0500 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089384780.6120.35.camel@squizzey> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> <1089384780.6120.35.camel@squizzey> Message-ID: <20040709162142.GC23619@phlogiston.msp.redhat.com> On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote: > :) > > I do appreciate all that, however there are some press releases out > there that are not so clear .. > > There is certainly an implication in the news items I've seen that this > is "THE GFS" code .. as opposed to being a new and unstable version .. > > .. Incidentally, I was being kind - I've had many kernel crashes, even > after getting it going .. > > Regards, > Gareth. The code you are using is not the code currently being sold by redhat. That is the 6.0 code. You can download that in SRPM form at ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ There is fairly complete documentation for this code. However it does not use the DLM. Instead, GULM handles all the cluster manager issues. This code only runs on 2.4 kernels. The CVS code is going to be sold starting with RHEL 4. Some of the components, like the dlm are just now gotten out of the development stage. Others, like gnbd have been drastically rewritten. We REALLY appreciate all the testing that people are doing on these pieces, however, if you are trying to run something in production, I would encourage you to run the 6.0 code. -Ben Marzinski bmarzins at redhat.com > On Thu, 2004-07-08 at 16:26 -0500, Brian Jackson wrote: > > > > Hi, > > > > > > Just a general comment; > > > > > > I spent some time getting a cluster running, in summary - for anyone interested (IMHO); > > > > > Summary (IMHO); > > > > > a. Performance is good and it does work > > > > > > b. It looks promising, yet still alpha/beta > > > > I'm sure I read somewhere, that the current code is considered beta. > > If I'm making that up, It's implied by the fact that the DLM is quite > > new, and the port to 2.6 is also pretty fresh. > > > > > > > > c. Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability > > > > It is if you've got shared storage that implements raid. > > > > > > > > d. Documentation is severely lacking ... > > > > Agreed, but that's why there's a wiki, mailing list, and irc channel. > > > > > > > > I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's > not quite there yet .. > > > > I'm sure Red Hat's product for RHEL is nice, stable, and production > > ready. You are playing with relatively fresh code. > > > > --Brian > > > > > > > > Regards, > > > > > > Gareth. > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > http://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Gareth Bult > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From phillips at redhat.com Fri Jul 9 16:36:40 2004 From: phillips at redhat.com (Daniel Phillips) Date: Fri, 9 Jul 2004 12:36:40 -0400 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089384780.6120.35.camel@squizzey> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089384780.6120.35.camel@squizzey> Message-ID: <200407091236.40966.phillips@redhat.com> On Friday 09 July 2004 10:53, Gareth Bult wrote: > I do appreciate all that, however there are some press releases out > there that are not so clear .. Which press releases? > There is certainly an implication in the news items I've seen that > this is "THE GFS" code .. as opposed to being a new and unstable > version .. It's very clear that the 2.6 release is out there so that hackers can get to work on it, add to it, and go find bugs. SRPMs for the stable 6.0 release are linked from the cluster page. They build against RHEL3 kernels. > .. Incidentally, I was being kind - I've had many kernel crashes, > even after getting it going .. On 2.6? No surprise. Please post any oopses to the list. Regards, Daniel From Gareth at Bult.co.uk Fri Jul 9 16:30:45 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Fri, 09 Jul 2004 17:30:45 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <20040709162142.GC23619@phlogiston.msp.redhat.com> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> <1089384780.6120.35.camel@squizzey> <20040709162142.GC23619@phlogiston.msp.redhat.com> Message-ID: <1089390645.6121.68.camel@squizzey> Hi, I'm afraid it's over a year since I used a 2.4 kernel, so the SRPMS aren't much use to me personally .. If someone were to make the "tools" available to configure and maintain a cluster at the same time as the beta code, it might make more sense. As things stand however I'm afraid I found finding relevant documentation too much like hard work .vs. current code stability. If you could document the current cluster.xml file I'd be happy to try again .. I keep reading the 6.0 docs which don't document the file and explicitly state "do not edit this file by hand" .. damned if you do and damned if you don't .. not least as ccsd (if not other components) crash silently if there is an error in cluster.xml (!) Regards, Gareth. On Fri, 2004-07-09 at 11:21 -0500, Benjamin Marzinski wrote: > On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote: > > :) > > > > I do appreciate all that, however there are some press releases out > > there that are not so clear .. > > > > There is certainly an implication in the news items I've seen that this > > is "THE GFS" code .. as opposed to being a new and unstable version .. > > > > .. Incidentally, I was being kind - I've had many kernel crashes, even > > after getting it going .. > > > > Regards, > > Gareth. > > The code you are using is not the code currently being sold by redhat. > That is the 6.0 code. You can download that in SRPM form at > ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ > > There is fairly complete documentation for this code. However it does not use > the DLM. Instead, GULM handles all the cluster manager issues. This code only > runs on 2.4 kernels. > > The CVS code is going to be sold starting with RHEL 4. Some of the components, > like the dlm are just now gotten out of the development stage. Others, like > gnbd have been drastically rewritten. We REALLY appreciate all the testing > that people are doing on these pieces, however, if you are trying to run > something in production, I would encourage you to run the 6.0 code. > > -Ben Marzinski > bmarzins at redhat.com > > > On Thu, 2004-07-08 at 16:26 -0500, Brian Jackson wrote: > > > > > > Hi, > > > > > > > > Just a general comment; > > > > > > > > I spent some time getting a cluster running, in summary - for anyone interested (IMHO); > > > > > > > Summary (IMHO); > > > > > > > a. Performance is good and it does work > > > > > > > > b. It looks promising, yet still alpha/beta > > > > > > I'm sure I read somewhere, that the current code is considered beta. > > > If I'm making that up, It's implied by the fact that the DLM is quite > > > new, and the port to 2.6 is also pretty fresh. > > > > > > > > > > > c. Until mirroring is implemented clvmd, it's not really replacement for NFS given the stability > > > > > > It is if you've got shared storage that implements raid. > > > > > > > > > > > d. Documentation is severely lacking ... > > > > > > Agreed, but that's why there's a wiki, mailing list, and irc channel. > > > > > > > > > > > I'm sure it will be good with a little more work (!) , but I was hoping for production, and it's > not quite there yet .. > > > > > > I'm sure Red Hat's product for RHEL is nice, stable, and production > > > ready. You are playing with relatively fresh code. > > > > > > --Brian > > > > > > > > > > > Regards, > > > > > > > > Gareth. > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > http://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Gareth Bult > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > http://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From Gareth at Bult.co.uk Fri Jul 9 16:42:12 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Fri, 09 Jul 2004 17:42:12 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <200407091236.40966.phillips@redhat.com> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089384780.6120.35.camel@squizzey> <200407091236.40966.phillips@redhat.com> Message-ID: <1089391332.6126.75.camel@squizzey> >Which press releases? Urm, how 'bout this one .. you might not call it a "press release", but I found it via an announcement on a news site .. maybe it's me but I don't see the words "development", "beta" or "not working yet" listed anywhere .. ;-) (Oopses listed on IRC when discovered, someone has them.. bmarzins I think..) Regards, Gareth. --- From: Ken Preslan [email blocked] To: Linux Kernel Mailing List [email blocked] Subject: GFS cluster filesystem re-released Date: 2004-06-24 22:53:49 Hi, I'd like to announce that Red Hat has re-released the GFS cluster filesystem and its related infrastructure under the GPL. The different projects that make up the infrastructure are: GFS - shared-disk cluster file system CLVM - clustering extensions to the LVM2 logical volume manager toolset CMAN - general-purpose symmetric cluster manager DLM - general-purpose distributed lock manager CCS - cluster configuration system to manage the cluster config file GULM - alternative redundant server-based lock/cluster manager for GFS GNBD - network block device driver shares storage over a network Fence - I/O fencing system The source code and patches for 2.6 are available at http://sources.redhat.com/cluster/. 2.4 source should show up early tomorrow. We're looking for people help us work on this project so we can eventually get it included into the Linux kernel. Comments, suggestions, patches, and testers are more than welcome. --- On Fri, 2004-07-09 at 12:36 -0400, Daniel Phillips wrote: > On Friday 09 July 2004 10:53, Gareth Bult wrote: > > I do appreciate all that, however there are some press releases out > > there that are not so clear .. > > Which press releases? > > > There is certainly an implication in the news items I've seen that > > this is "THE GFS" code .. as opposed to being a new and unstable > > version .. > > It's very clear that the 2.6 release is out there so that hackers can > get to work on it, add to it, and go find bugs. SRPMs for the stable > 6.0 release are linked from the cluster page. They build against RHEL3 > kernels. > > > .. Incidentally, I was being kind - I've had many kernel crashes, > > even after getting it going .. > > On 2.6? No surprise. Please post any oopses to the list. > > Regards, > > Daniel -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smiley-4.png Type: image/png Size: 822 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From phillips at redhat.com Fri Jul 9 17:49:55 2004 From: phillips at redhat.com (Daniel Phillips) Date: Fri, 9 Jul 2004 13:49:55 -0400 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089391332.6126.75.camel@squizzey> References: <20040708144724.GB18751@max.zg.iskon.hr> <200407091236.40966.phillips@redhat.com> <1089391332.6126.75.camel@squizzey> Message-ID: <200407091349.55362.phillips@redhat.com> On Friday 09 July 2004 12:42, Gareth Bult wrote: > >Which press releases? > > Urm, how 'bout this one .. you might not call it a "press release", > but I found it via an announcement on a news site .. > maybe it's me but I don't see the words "development", "beta" or "not > working yet" listed anywhere .. ;-) You found it on Linux Kernel Mailing List, and it's for 2.6. Please draw your own conclusion ;-) GFS 6.0 on 2.4 is the stable release. > (Oopses listed on IRC when discovered, someone has them.. bmarzins I > think..) Thanks. Daniel From amir at datacore.ch Fri Jul 9 18:24:47 2004 From: amir at datacore.ch (Amir Guindehi) Date: Fri, 09 Jul 2004 20:24:47 +0200 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089390645.6121.68.camel@squizzey> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> <1089384780.6120.35.camel@squizzey> <20040709162142.GC23619@phlogiston.msp.redhat.com> <1089390645.6121.68.camel@squizzey> Message-ID: <40EEE2EF.8010000@datacore.ch> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Gareth, | If you could document the current cluster.xml file I'd be happy to try | again .. I keep reading the 6.0 docs which don't document the file and | explicitly state "do not edit this file by hand" .. damned if you do and | damned if you don't .. not least as ccsd (if not other components) crash | silently if there is an error in cluster.xml (!) Did you find the GFS documentation I wrote? It's available at: https://open.datacore.ch/page/GFS https://open.datacore.ch/page/GFS.Install The later document includes two sample cluster.xml files for a two node setup and for a three node setup with manual fencing. Regards - - Amir - -- Amir Guindehi, nospam.amir at datacore.ch DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-nr1 (Windows 2000) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFA7uLtbycOjskSVCwRAv5cAKCJOYl+3cxdY4FP1M7Im71P1cGVUACfSzoa jiNyYrmjyCr7GckAXGVYmVM= =9/tE -----END PGP SIGNATURE----- From Gareth at Bult.co.uk Sat Jul 10 08:57:02 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Sat, 10 Jul 2004 09:57:02 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <200407091349.55362.phillips@redhat.com> References: <20040708144724.GB18751@max.zg.iskon.hr> <200407091236.40966.phillips@redhat.com> <1089391332.6126.75.camel@squizzey> <200407091349.55362.phillips@redhat.com> Message-ID: <1089449822.6121.103.camel@squizzey> Urm, no .. I'm not on the kernel mailing list. (Are you implying 2.6 is unstable ?! It's way safer to use that 2.4 !!) I'm inclined at this point to mention Fedora and all my years using Redhat and my relatively recent move to Gentoo .. suffice to say I picked up the code off a public news site and thought it was stable enough to play with. (and it's not) How about some big notices on the source web pages to the effect that it's for experimental use only and should not be used near a production environment (?!) Regards, Gareth. On Fri, 2004-07-09 at 13:49 -0400, Daniel Phillips wrote: > On Friday 09 July 2004 12:42, Gareth Bult wrote: > > >Which press releases? > > > > Urm, how 'bout this one .. you might not call it a "press release", > > but I found it via an announcement on a news site .. > > maybe it's me but I don't see the words "development", "beta" or "not > > working yet" listed anywhere .. ;-) > > You found it on Linux Kernel Mailing List, and it's for 2.6. Please > draw your own conclusion ;-) > > GFS 6.0 on 2.4 is the stable release. > > > (Oopses listed on IRC when discovered, someone has them.. bmarzins I > > think..) > > Thanks. > > Daniel -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mailing-lists at hughesjr.com Sat Jul 10 10:47:05 2004 From: mailing-lists at hughesjr.com (Johnny Hughes) Date: Sat, 10 Jul 2004 05:47:05 -0500 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089449822.6121.103.camel@squizzey> References: <1089449822.6121.103.camel@squizzey> Message-ID: <1089456425.10000.10.camel@Myth.home.local> Gareth, What I think everyone is saying ... not implying, but saying... is this. RHEL is stable (if you use the supported kernel), and GFS 6.0 for RHEL is stable. If you are using anything else, it is not stable. Why is that so hard to understand? The 2.6 Kernel is stable ... however, it is not stable (or supported) on RHEL ... and the code GFS code for the 2.6 kernel is not recommended for use on a production machine with a 2.6 kernel. Use the GFS code for the 2.6 kernel on a production machine at your own risk. At least that is what I got out of the posts ... maybe I'm wrong though Johnny Hughes HughesJR.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From madmax at iskon.hr Sat Jul 10 18:38:57 2004 From: madmax at iskon.hr (Kresimir Kukulj) Date: Sat, 10 Jul 2004 20:38:57 +0200 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <20040709162142.GC23619@phlogiston.msp.redhat.com> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> <1089384780.6120.35.camel@squizzey> <20040709162142.GC23619@phlogiston.msp.redhat.com> Message-ID: <20040710183857.GA7532@max.zg.iskon.hr> Quoting Benjamin Marzinski (bmarzins at redhat.com): > On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote: > > :) > > > > I do appreciate all that, however there are some press releases out > > there that are not so clear .. > > > > There is certainly an implication in the news items I've seen that this > > is "THE GFS" code .. as opposed to being a new and unstable version .. > > > > .. Incidentally, I was being kind - I've had many kernel crashes, even > > after getting it going .. > > > > Regards, > > Gareth. > > The code you are using is not the code currently being sold by redhat. > That is the 6.0 code. You can download that in SRPM form at > ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ > > There is fairly complete documentation for this code. However it does not use > the DLM. Instead, GULM handles all the cluster manager issues. This code only > runs on 2.4 kernels. > > The CVS code is going to be sold starting with RHEL 4. Some of the components, > like the dlm are just now gotten out of the development stage. Others, like Is this new DLM still dependent on single lock storage or is it distributed (like in OpenDLM) ? > gnbd have been drastically rewritten. We REALLY appreciate all the testing You are saying that GNBD is rewritten. How does it compare to GNBD in GFS-6.0 (version sold by RedHat) in stability, performance, features ? > that people are doing on these pieces, however, if you are trying to run > something in production, I would encourage you to run the 6.0 code. Thanks, I'll look into it. Does anyone use some software based shared storage like GNBD, iSCSI or HyperSCSI as an alternative to expensive FibreChannel hardware ? If you do, can you describe your experiences (how stable it is, performance, which implementation)... I believe this information will be interesting to other people too. Browsing the net, there are couple of variants of network block device (NBD, ENBD, DRBD) but they don't support more than one client (and both sides cannot be used at the same time). There is of course GNBD: - OpenGFS version - not maintained anymore. - GFS-6.0 version sold by RedHat (2.4 kernel). - GFS-XX version from re-released sources of GFS ported to 2.6 kernel. There are two 'target' implementations of iSCSI protocol: - http://unh-iscsi.sourceforge.net/ initiator implementation is their primary development. They have target implemented but is currently mostly used to test the initiator. Runs on 2.4 and 2.6 kernels. - http://www.ardistech.com/iscsi/ iSCSI target implementation for 2.4 kernel's only. - http://linux-iscsi.sourceforge.net/ iSCSI initiator implementation for 2.4 and 2.6 kernels. Anything else? -- Kresimir Kukulj madmax at iskon.hr +--------------------------------------------------+ Old PC's never die. They just become Unix terminals. From Gareth at Bult.co.uk Sat Jul 10 21:42:45 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Sat, 10 Jul 2004 22:42:45 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <40EEE2EF.8010000@datacore.ch> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> <1089384780.6120.35.camel@squizzey> <20040709162142.GC23619@phlogiston.msp.redhat.com> <1089390645.6121.68.camel@squizzey> <40EEE2EF.8010000@datacore.ch> Message-ID: <1089495765.6126.148.camel@squizzey> Hi, > Did you find the GFS documentation I wrote? > It's available at: > > https://open.datacore.ch/page/GFS > https://open.datacore.ch/page/GFS.Install I certainly did, very nice it is too .. :) > The later document includes two sample cluster.xml files for a two node > setup and for a three node setup with manual fencing. Urm, yes, it does. However, after reading some of the 6.0 docs, this covers about 2% of the things you can do with cluster.xml .. (!) After getting a cluster "working" I started looking at services, shared IP's etc .. I didn't really stand a chance with all the "don't edit by hand" stuff in the 6.0 docs - they don't list the cluster.xml's to go with the examples, they just give screen shots of the config apps ... :( Regards, Gareth. > > Regards > - - Amir > - -- > Amir Guindehi, nospam.amir at datacore.ch > DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2-nr1 (Windows 2000) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFA7uLtbycOjskSVCwRAv5cAKCJOYl+3cxdY4FP1M7Im71P1cGVUACfSzoa > jiNyYrmjyCr7GckAXGVYmVM= > =9/tE > -----END PGP SIGNATURE----- > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smiley-6.png Type: image/png Size: 796 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smiley-3.png Type: image/png Size: 819 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From Gareth at Bult.co.uk Sat Jul 10 21:46:24 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Sat, 10 Jul 2004 22:46:24 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089456425.10000.10.camel@Myth.home.local> References: <1089449822.6121.103.camel@squizzey> <1089456425.10000.10.camel@Myth.home.local> Message-ID: <1089495984.6120.153.camel@squizzey> Hi, > What I think everyone is saying ... not implying, but saying... is > this. > > RHEL is stable (if you use the supported kernel), and GFS 6.0 for RHEL > is stable. If you are using anything else, it is not stable. Why is > that so hard to understand? Perhaps because for non-redhat users, 2.4 is considered "old hat" and they can't understand why Redhat is *still* using 2.4 (?!) > The 2.6 Kernel is stable ... however, it is not stable (or supported) > on RHEL ... and the code GFS code for the 2.6 kernel is not > recommended for use on a production machine with a 2.6 kernel. Use > the GFS code for the 2.6 kernel on a production machine at your own > risk. Urm, I guess I don't "have" to use 2.6, but it would be "really" painful for me not to use 2.6 .. for way more reasons than I want to list here. > At least that is what I got out of the posts ... maybe I'm wrong > though Sure, thought you should appreciate that for people who've been off 2.4 and in production on 2.6 for a long time, comments like "you should be using 2.4" are a little redundant. Regards, Gareth. > > Johnny Hughes > HughesJR.com > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From mailing-lists at hughesjr.com Sat Jul 10 22:37:32 2004 From: mailing-lists at hughesjr.com (Johnny Hughes) Date: Sat, 10 Jul 2004 17:37:32 -0500 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089495984.6120.153.camel@squizzey> References: <1089495984.6120.153.camel@squizzey> Message-ID: <1089499052.5230.27.camel@Myth.home.local> On Sat, 2004-07-10 at 16:46, Gareth Bult wrote: > Hi, > > > >What I think everyone is saying ... not implying, but saying... is this. > > > >RHEL is stable (if you use the supported kernel), and GFS 6.0 for RHEL > >is stable. If you are using anything else, it is not stable. Why is > >that so hard to understand? > > > > Perhaps because for non-redhat users, 2.4 is considered "old hat" and > they can't understand why Redhat is *still* using 2.4 (?!) > RHEL is using a 2.4 kernel because that is what they chose to make stable. You are on a RedHat mailing list discussing a RedHat product. Thousands of customers running Oracle on RHEL 3 AS are quite happy that RedHat is using a 2.4.21 kernel (as an example). They are also happy that RedHat is making GFS 6.0 available for the RHEL 3 product line. > > > >The 2.6 Kernel is stable ... however, it is not stable (or supported) on > >RHEL ... and the code GFS code for the 2.6 kernel is not recommended for > >use on a production machine with a 2.6 kernel. Use the GFS code for the > >2.6 kernel on a production machine at your own risk. > > > > Urm, I guess I don't "have" to use 2.6, but it would be "really" painful > for me not to use 2.6 .. for way more reasons than I want to list here. > Use whatever you want ... only don't expect software that someone says is unstable to be stable. IF you want a stable GFS from RedHat ... use RHEL and GFS 6. If you want to use another distro and another GFS ... great ... just don't complain that it is not stable. > > > >At least that is what I got out of the posts ... maybe I'm wrong though > > > > Sure, thought you should appreciate that for people who've been off 2.4 > and in production on 2.6 for a long time, comments like "you should be > using 2.4" are a little redundant. Again ... you are the person who chooses what technology you deploy ... but RedHat is going to put out stable products for their supported RHEL. If you can make it work on a different distro with a different kernel, great. > > Regards, > Gareth. > > > > > Johnny Hughes > HughesJR.com > Johnny Hughes -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gareth at Bult.co.uk Sun Jul 11 15:18:40 2004 From: Gareth at Bult.co.uk (Gareth Bult) Date: Sun, 11 Jul 2004 16:18:40 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089499052.5230.27.camel@Myth.home.local> References: <1089495984.6120.153.camel@squizzey> <1089499052.5230.27.camel@Myth.home.local> Message-ID: <1089559120.6120.163.camel@squizzey> Hi, Sorry, my mistake, I read the list title as "linux-cluster" as opposed to "redhat-linux-server-3.0-cluster". As I'm no longer a RH user, and now I know, I'll unsubscribe. Thanks for the clarification. Regards, Gareth. On Sat, 2004-07-10 at 17:37 -0500, Johnny Hughes wrote: > On Sat, 2004-07-10 at 16:46, Gareth Bult wrote: > > > Hi, > > > > > > >What I think everyone is saying ... not implying, but saying... is this. > > > > > > >RHEL is stable (if you use the supported kernel), and GFS 6.0 for RHEL > > >is stable. If you are using anything else, it is not stable. Why is > > >that so hard to understand? > > > > > > > > Perhaps because for non-redhat users, 2.4 is considered "old hat" and > > they can't understand why Redhat is *still* using 2.4 (?!) > > RHEL is using a 2.4 kernel because that is what they chose to make > stable. You are on a RedHat mailing list discussing a RedHat product. > Thousands of customers running Oracle on RHEL 3 AS are quite happy > that RedHat is using a 2.4.21 kernel (as an example). They are also > happy that RedHat is making GFS 6.0 available for the RHEL 3 product > line. > > > > > > > >The 2.6 Kernel is stable ... however, it is not stable (or supported) on > > >RHEL ... and the code GFS code for the 2.6 kernel is not recommended for > > >use on a production machine with a 2.6 kernel. Use the GFS code for the > > >2.6 kernel on a production machine at your own risk. > > > > > > > > Urm, I guess I don't "have" to use 2.6, but it would be "really" painful > > for me not to use 2.6 .. for way more reasons than I want to list here. > > Use whatever you want ... only don't expect software that someone says > is unstable to be stable. IF you want a stable GFS from RedHat ... > use RHEL and GFS 6. If you want to use another distro and another > GFS ... great ... just don't complain that it is not stable. > > > > > > > >At least that is what I got out of the posts ... maybe I'm wrong though > > > > > > > > Sure, thought you should appreciate that for people who've been off 2.4 > > and in production on 2.6 for a long time, comments like "you should be > > using 2.4" are a little redundant. > > Again ... you are the person who chooses what technology you > deploy ... but RedHat is going to put out stable products for their > supported RHEL. If you can make it work on a different distro with a > different kernel, great. > > > > > Regards, > > Gareth. > > > > > > > > > > Johnny Hughes > > HughesJR.com > > Johnny Hughes -- Gareth Bult -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From pcaulfie at redhat.com Mon Jul 12 07:49:10 2004 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Mon, 12 Jul 2004 08:49:10 +0100 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <20040710183857.GA7532@max.zg.iskon.hr> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> <1089384780.6120.35.camel@squizzey> <20040709162142.GC23619@phlogiston.msp.redhat.com> <20040710183857.GA7532@max.zg.iskon.hr> Message-ID: <20040712074909.GC11355@tykepenguin.com> On Sat, Jul 10, 2004 at 08:38:57PM +0200, Kresimir Kukulj wrote: > Quoting Benjamin Marzinski (bmarzins at redhat.com): > > On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote: > > > :) > > > > > > I do appreciate all that, however there are some press releases out > > > there that are not so clear .. > > > > > > There is certainly an implication in the news items I've seen that this > > > is "THE GFS" code .. as opposed to being a new and unstable version .. > > > > > > .. Incidentally, I was being kind - I've had many kernel crashes, even > > > after getting it going .. > > > > > > Regards, > > > Gareth. > > > > The code you are using is not the code currently being sold by redhat. > > That is the 6.0 code. You can download that in SRPM form at > > ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ > > > > There is fairly complete documentation for this code. However it does not use > > the DLM. Instead, GULM handles all the cluster manager issues. This code only > > runs on 2.4 kernels. > > > > The CVS code is going to be sold starting with RHEL 4. Some of the components, > > like the dlm are just now gotten out of the development stage. Others, like > > Is this new DLM still dependent on single lock storage or is it distributed > (like in OpenDLM) ? The DLM is fully distributed. -- patrick From arekm at pld-linux.org Mon Jul 12 09:36:12 2004 From: arekm at pld-linux.org (Arkadiusz Miskiewicz) Date: Mon, 12 Jul 2004 11:36:12 +0200 Subject: [Linux-cluster] Fwd: gfs fixes for PPC32 platform... Message-ID: <200407121136.12773.arekm@pld-linux.org> ---------- Forwarded Message ---------- Subject: gfs fixes for PPC32 platform... Date: Monday 12 of July 2004 11:30 From: Pawe? Sikora To: arekm at pld-linux.org [ fs/gfs_locking/lock_gulm/utils_verb_flags.c ]: The `strncasecmp' function confilcts with arch/ppc{,64}/lib/strcase.c Please, rename it or link with proper arch/*/lib/built-in.o [ fs/gfs/log.c ]: The sequence `switch (head_wrap - dump_wrap)' uses __ucmpdi2 (for 64-bits ops) from libgcc_s.so and finally causing `unresolved symbol' in module. fix: __u64 tmp = head_wrap - dump_wrap; if (tmp < 0x100000000LLU) switch ((__u32)tmp) { .... } else // default action. -- /* Copyright (C) 2003, SCO, Inc. This is valuable Intellectual Property. */ #define say(x) lie(x) ------------------------------------------------------- -- Arkadiusz Mi?kiewicz CS at FoE, Wroclaw University of Technology arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux -------------- next part -------------- A non-text attachment was scrubbed... Name: 2.6.7-ppc-strncasecmp.patch Type: text/x-diff Size: 1238 bytes Desc: not available URL: From lhh at redhat.com Mon Jul 12 13:41:24 2004 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 12 Jul 2004 09:41:24 -0400 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089495765.6126.148.camel@squizzey> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> <1089384780.6120.35.camel@squizzey> <20040709162142.GC23619@phlogiston.msp.redhat.com> <1089390645.6121.68.camel@squizzey> <40EEE2EF.8010000@datacore.ch> <1089495765.6126.148.camel@squizzey> Message-ID: <1089639684.13281.146.camel@atlantis.boston.redhat.com> On Sat, 2004-07-10 at 22:42 +0100, Gareth Bult wrote: > However, after reading some of the 6.0 docs, this covers about 2% of > the things you can do with cluster.xml .. (!) True. But cluster.xml is rather different from the one in RHCS/RHGFS. > After getting a cluster "working" I started looking at services, > shared IP's etc .. I didn't really stand a chance with all the "don't > edit by hand" stuff in the 6.0 docs - they don't list the cluster. > xml's to go with the examples, they just give screen shots of the > config apps ... :( Well, the shared IPs and stuff won't really do much until Friday... ;) I hope to commit the cold-failover component to CVS by then. The config app won't be available for awhile; it was for RHCS and RHGFS; which have much less "stuff" to deal with than this project. However, I hope to expand the preliminary things I've sent to this list so that people can define resource groups by (gasp) hand editing - at least until the GUI is built. -- Lon From bmarzins at redhat.com Mon Jul 12 15:56:51 2004 From: bmarzins at redhat.com (Benjamin Marzinski) Date: Mon, 12 Jul 2004 10:56:51 -0500 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <20040710183857.GA7532@max.zg.iskon.hr> References: <20040708144724.GB18751@max.zg.iskon.hr> <1089316935.6121.7.camel@squizzey> <1089384780.6120.35.camel@squizzey> <20040709162142.GC23619@phlogiston.msp.redhat.com> <20040710183857.GA7532@max.zg.iskon.hr> Message-ID: <20040712155651.GE23619@phlogiston.msp.redhat.com> On Sat, Jul 10, 2004 at 08:38:57PM +0200, Kresimir Kukulj wrote: > Quoting Benjamin Marzinski (bmarzins at redhat.com): > > On Fri, Jul 09, 2004 at 03:53:00PM +0100, Gareth Bult wrote: > > > :) > > > > > > I do appreciate all that, however there are some press releases out > > > there that are not so clear .. > > > > > > There is certainly an implication in the news items I've seen that this > > > is "THE GFS" code .. as opposed to being a new and unstable version .. > > > > > > .. Incidentally, I was being kind - I've had many kernel crashes, even > > > after getting it going .. > > > > > > Regards, > > > Gareth. > > > > The code you are using is not the code currently being sold by redhat. > > That is the 6.0 code. You can download that in SRPM form at > > ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/RHGFS/i386/SRPMS/ > > > > There is fairly complete documentation for this code. However it does not use > > the DLM. Instead, GULM handles all the cluster manager issues. This code only > > runs on 2.4 kernels. > > > > The CVS code is going to be sold starting with RHEL 4. Some of the components, > > like the dlm are just now gotten out of the development stage. Others, like > > Is this new DLM still dependent on single lock storage or is it distributed > (like in OpenDLM) ? > > > gnbd have been drastically rewritten. We REALLY appreciate all the testing > > You are saying that GNBD is rewritten. How does it compare to GNBD in > GFS-6.0 (version sold by RedHat) in stability, performance, features ? Since it has just been rewritten, it is currently pretty unstable... The rewrite involved removing large chunks of gnbd from the kernel, and doing them in user space. This should make it easier to maintain. So once it stabilizes, it should be better. Performance testing will be done as soon as I'm happy with the stability. It should be pretty much the same... Not too much of the core functionality changed. The features are pretty much identical, except that the new code auto reconnects if it looses a connection. The rewrite was done because the block layer had some largish changes from 2.4 to 2.6, and to be more inline with the way redhat ships maintains products. -Ben Marzinski bmarzins at redhat.com From mtilstra at redhat.com Mon Jul 12 17:09:39 2004 From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra) Date: Mon, 12 Jul 2004 12:09:39 -0500 Subject: [Linux-cluster] Fwd: gfs fixes for PPC32 platform... In-Reply-To: <200407121136.12773.arekm@pld-linux.org> References: <200407121136.12773.arekm@pld-linux.org> Message-ID: <20040712170939.GA1471@redhat.com> On Mon, Jul 12, 2004 at 11:36:12AM +0200, Arkadiusz Miskiewicz wrote: > ---------- Forwarded Message ---------- > > Subject: gfs fixes for PPC32 platform... > Date: Monday 12 of July 2004 11:30 > From: Pawe? Sikora > To: arekm at pld-linux.org > > [ fs/gfs_locking/lock_gulm/utils_verb_flags.c ]: > > The `strncasecmp' function confilcts with arch/ppc{,64}/lib/strcase.c > Please, rename it or link with proper arch/*/lib/built-in.o > the utils_verb_flags stuff isn't actually needed anymore, so I just removed it. Which as a side affect, should fix the compile thing you see. (odd how ppc and ppc64 seem to be the only archs that have a strncasecmp function...) Thanks for catching that. -- Michael Conrad Tadpol Tilstra Even though I feel like I might ignite, I probably won't. But I'm gonna try anyways. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From phillips at redhat.com Mon Jul 12 18:09:38 2004 From: phillips at redhat.com (Daniel Phillips) Date: Mon, 12 Jul 2004 14:09:38 -0400 Subject: [Linux-cluster] GNBD, how good it is ? In-Reply-To: <1089559120.6120.163.camel@squizzey> References: <1089495984.6120.153.camel@squizzey> <1089499052.5230.27.camel@Myth.home.local> <1089559120.6120.163.camel@squizzey> Message-ID: <200407121409.38256.phillips@redhat.com> On Sunday 11 July 2004 11:18, Gareth Bult wrote: > Hi, > > Sorry, my mistake, I read the list title as "linux-cluster" as > opposed to "redhat-linux-server-3.0-cluster". > > As I'm no longer a RH user, and now I know, I'll unsubscribe. > > Thanks for the clarification. > > Regards, > Gareth. Suit yourself, however please be accurate in your comments. You have mischaracterized this list as a Red Hat product-oriented list. It is not, it is a community forum, please read the other posts to be sure of that. Regards, Daniel From phillips at redhat.com Mon Jul 12 20:18:50 2004 From: phillips at redhat.com (Daniel Phillips) Date: Mon, 12 Jul 2004 16:18:50 -0400 Subject: [Linux-cluster] [ANNOUNCE] Cluster Infrastructure BOF at OLS Message-ID: <200407121618.50987.phillips@redhat.com> Hi all, There will be a BOF at OLS for those interested in hammering out issues of cluster infrastructure for Linux. http://www.linuxsymposium.org/2004/view_abstract.php?content_key=203 Friday July 23rd, 8:00 to 9:00 PM, Room D (Watch for last minute schedule changes.) The format will be: * Panel discussion, 20 minutes * Open discussion, 30 minutes * Wrapup, 10 minutes. Participants should come equipped with an adequate supply of fire retardant and/or a good flamesuit. P.S., Please remove cross-posts as appropriate if you reply. Regards, Daniel From arekm at pld-linux.org Tue Jul 13 14:25:58 2004 From: arekm at pld-linux.org (Arkadiusz Miskiewicz) Date: Tue, 13 Jul 2004 16:25:58 +0200 Subject: [Linux-cluster] [PATCH]: include cluster config on more arches Message-ID: <200407131625.58495.arekm@pld-linux.org> >From qboosh at pld-linux.org - include cluster config on more arches --- linux-2.6.7/arch/ppc/Kconfig.orig 2004-07-09 23:03:07.000000000 +0000 +++ linux-2.6.7/arch/ppc/Kconfig 2004-07-10 09:17:08.000000000 +0000 @@ -1247,6 +1247,8 @@ source "lib/Kconfig" +source "cluster/Kconfig" + source "arch/ppc/oprofile/Kconfig" menu "Kernel hacking" --- linux-2.6.7/arch/ia64/Kconfig.orig 2004-07-09 23:03:06.000000000 +0000 +++ linux-2.6.7/arch/ia64/Kconfig 2004-07-10 09:19:56.000000000 +0000 @@ -368,6 +368,8 @@ source "lib/Kconfig" +source "cluster/Kconfig" + source "arch/ia64/hp/sim/Kconfig" source "arch/ia64/oprofile/Kconfig" -- Arkadiusz Mi?kiewicz CS at FoE, Wroclaw University of Technology arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux From pcaulfie at redhat.com Tue Jul 13 14:47:14 2004 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 13 Jul 2004 15:47:14 +0100 Subject: [Linux-cluster] [PATCH]: include cluster config on more arches In-Reply-To: <200407131625.58495.arekm@pld-linux.org> References: <200407131625.58495.arekm@pld-linux.org> Message-ID: <20040713144714.GG14327@tykepenguin.com> On Tue, Jul 13, 2004 at 04:25:58PM +0200, Arkadiusz Miskiewicz wrote: > > >From qboosh at pld-linux.org - include cluster config on more arches I added a few last week: $ patch -p1 < ~/dev/cluster/cman-kernel/patches/2.6.7/00001.patch patching file arch/alpha/Kconfig patching file arch/arm/Kconfig patching file arch/arm26/Kconfig patching file arch/cris/Kconfig patching file arch/i386/Kconfig patching file arch/ia64/Kconfig patching file arch/m68k/Kconfig patching file arch/mips/Kconfig patching file arch/parisc/Kconfig patching file arch/ppc/Kconfig patching file arch/ppc64/Kconfig patching file arch/s390/Kconfig patching file arch/sh/Kconfig patching file arch/sparc/Kconfig patching file arch/sparc64/Kconfig patching file arch/um/Kconfig patching file arch/x86_64/Kconfig Are there any missing from that (apart from the non-MMU ones) ? :-) -- patrick From ced at md3.vsnl.net.in Tue Jul 13 23:45:23 2004 From: ced at md3.vsnl.net.in (ced) Date: Tue, 13 Jul 2004 16:45:23 -0700 Subject: [Linux-cluster] Details Wanted Message-ID: <000501c46933$78188d20$0100a8c0@cedtn> Dear Sir This is Jeyaram P from India regarding to establishing a linux based cluster. I have 20 numbers of Intel Celeron 366 Mhz processor computer. Now i want to make a cluster using the above said systems. The RAM capasity is varied from 32 MBto 128 MB system to system. Plz guide me to estblish a linux cluster. My e-mail id : sadmn_cedtn at yahoo.co.in apjram at yahoo.com Looking forward With regards Jeyaram P From john.hearns at clustervision.com Tue Jul 13 15:43:01 2004 From: john.hearns at clustervision.com (John Hearns) Date: Tue, 13 Jul 2004 16:43:01 +0100 Subject: [Linux-cluster] Details Wanted In-Reply-To: <000501c46933$78188d20$0100a8c0@cedtn> References: <000501c46933$78188d20$0100a8c0@cedtn> Message-ID: <1089733381.4373.90.camel@vigor12> On Wed, 2004-07-14 at 00:45, ced wrote: > Dear Sir > > This is Jeyaram P from India regarding to establishing a linux based > cluster. Jeyaram, do you mean a computational cluster, also known as a Beowulf cluster? You could make a start by looking at this site http://www.beowulf.org From arekm at pld-linux.org Tue Jul 13 15:45:31 2004 From: arekm at pld-linux.org (Arkadiusz Miskiewicz) Date: Tue, 13 Jul 2004 17:45:31 +0200 Subject: [Linux-cluster] [PATCH]: include cluster config on more arches In-Reply-To: <20040713144714.GG14327@tykepenguin.com> References: <200407131625.58495.arekm@pld-linux.org> <20040713144714.GG14327@tykepenguin.com> Message-ID: <200407131745.31536.arekm@pld-linux.org> On Tuesday 13 of July 2004 16:47, Patrick Caulfield wrote: > I added a few last week: Great. > Are there any missing from that (apart from the non-MMU ones) ? Probably not - thanks! -- Arkadiusz Mi?kiewicz CS at FoE, Wroclaw University of Technology arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux From don at smugmug.com Tue Jul 13 22:22:51 2004 From: don at smugmug.com (Don MacAskill) Date: Tue, 13 Jul 2004 15:22:51 -0700 Subject: [Linux-cluster] GFS limits? Message-ID: <40F460BB.4040603@smugmug.com> Hi there, I've been peripherally following GFS's progress for the last two years or so, and I'm very interested in using it. We were already on Red Hat when Sistina was acquired, so I've been waiting to see what Red Hat will do with it. But before I get ahold of the sales people, I thought I'd find out a little more about it. We have two use cases where I can see it being useful: - For our web server clusters to share a single "snapshot" of our application code amongst themselves. GFS obviously functions great in this environment and would be useful. - For our backend image data storage. We currently have 35TB of storage, and it's growing at a rapid rate. I'd like to be able to scale into hundreds of petabytes some day, and would like to select a solution early that will scale large. Migrating a few hundred TBs from one solution to another already keeps me up at night... PBs would make me go insane. This is the use case I'm not sure of with regards to GFS. Does GFS somehow get around the 1TB block device issue? Just how large can a single exported filesystem be with GFS? Our current (homegrown) solution will scale very well for quite some time, but eventually we're going to get saturated with write requests to individual head units. Does GFS intelligently "spread the load" among multiple storage entities for writing under high load? Does it always write to any available storage units, or are there thresholds where it expands the pool of units it writes to? (I'm not sure I'm making much sense, but we'll see if any of you grok it :) In the event of some multiple-catastrophe failure (where some data isn't online at all, let alone redundant), how graceful is GFS? Does it "rope off" the data that's not available and still allow full access to the data that is? Or does the whole cluster go down? I notice the pricing for GFS is $2200. Is that per seat? And if so, what's a "seat"? Each client? Each server with storage participating in the cluster? Both? Some other distinction? Is AS a prereq for clients? Servers? Both? Or will ES and WS boxes be able to participate as well? Whew, that should be enough to get us started. Thanks in advance! Don -------------- next part -------------- A non-text attachment was scrubbed... Name: don.vcf Type: text/x-vcard Size: 253 bytes Desc: not available URL: From notiggy at gmail.com Tue Jul 13 22:50:00 2004 From: notiggy at gmail.com (Brian Jackson) Date: Tue, 13 Jul 2004 17:50:00 -0500 Subject: [Linux-cluster] GFS limits? In-Reply-To: <40F460BB.4040603@smugmug.com> References: <40F460BB.4040603@smugmug.com> Message-ID: On Tue, 13 Jul 2004 15:22:51 -0700, Don MacAskill wrote: > > Hi there, > > I've been peripherally following GFS's progress for the last two years > or so, and I'm very interested in using it. We were already on Red Hat > when Sistina was acquired, so I've been waiting to see what Red Hat will > do with it. But before I get ahold of the sales people, I thought I'd > find out a little more about it. > > We have two use cases where I can see it being useful: > > - For our web server clusters to share a single "snapshot" of our > application code amongst themselves. GFS obviously functions great in > this environment and would be useful. > > - For our backend image data storage. We currently have 35TB of > storage, and it's growing at a rapid rate. I'd like to be able to scale > into hundreds of petabytes some day, and would like to select a solution > early that will scale large. Migrating a few hundred TBs from one > solution to another already keeps me up at night... PBs would make me > go insane. This is the use case I'm not sure of with regards to GFS. > > Does GFS somehow get around the 1TB block device issue? Just how large > can a single exported filesystem be with GFS? The code that most people on this list are interested in currently is the code in cvs which is for 2.6 only. 2.6 has a config option to enable using devices larger than 2TB. I'm still reading through all the GFS code, but it's still architecturally the same as when it was closed source, so I'm pretty sure most of my knowledge from OpenGFS will still apply. GFS uses 64bit values internally, so you can have very large filesystems (larger than PBs). > > Our current (homegrown) solution will scale very well for quite some > time, but eventually we're going to get saturated with write requests to > individual head units. Does GFS intelligently "spread the load" among > multiple storage entities for writing under high load? No, each node that mounts has direct access to the storage. It writes just like any other fs, when it can. > Does it always > write to any available storage units, or are there thresholds where it > expands the pool of units it writes to? (I'm not sure I'm making much > sense, but we'll see if any of you grok it :) I think you may have a little misconception about just what GFS is. You should check the WHATIS_OpenGFS doc at http://opengfs.sourceforge.net/docs.php It says OpenGFS, but for the most part, the same stuff applies to GFS. > > In the event of some multiple-catastrophe failure (where some data isn't > online at all, let alone redundant), how graceful is GFS? Does it "rope > off" the data that's not available and still allow full access to the > data that is? Or does the whole cluster go down? That's a good question that I don't know the answer to. But I'd imagine that it wouldn't be terribly happy. Sorry I don't know more. Maybe one of the GFS devs will know better. > > I notice the pricing for GFS is $2200. Is that per seat? And if so, > what's a "seat"? Each client? Each server with storage participating > in the cluster? Both? Some other distinction? Now I definitely know you have some misconception. GFS doesn't have any concept of server and client. All nodes mount the fs directly since they are all directly connected to the storage. > > Is AS a prereq for clients? Servers? Both? Or will ES and WS boxes be > able to participate as well? I'll punt to Red Hat people here. > > Whew, that should be enough to get us started. > > Thanks in advance! > > Don > --Brian Jackson From rbrown at metservice.com Tue Jul 13 23:48:53 2004 From: rbrown at metservice.com (Royce Brown) Date: Wed, 14 Jul 2004 11:48:53 +1200 Subject: [Linux-cluster] node failing Message-ID: <200407141148292.SM01912@rbrown> Hi, I am trying to track down a problem I'll been having with the clustering software on redhat 3.0 (supplied rpm's). I am running a 2 cluster node using Multicast Heartbeat, Network Tiebreaker IP address and have bonded Ethernet interfaces to different switches. The problem is that you start the cluster and everything is working fine and then suddenly one node (always the same one) thinks the other node has become Inactive. Its gets into a state where one node thinks both nodes are active and the other node only thinks it is active. There is no networking problems that I can see. On the bad node I can ping the other node by it's address and the multicast address. I have full debug mode on, but the log files don't show anything. Has any one else seen this problem or can give me some tips what to look at next ? Cheers Royce -------------- next part -------------- An HTML attachment was scrubbed... URL: From kpreslan at redhat.com Tue Jul 13 23:55:19 2004 From: kpreslan at redhat.com (Ken Preslan) Date: Tue, 13 Jul 2004 18:55:19 -0500 Subject: [Linux-cluster] GFS limits? In-Reply-To: <40F460BB.4040603@smugmug.com> References: <40F460BB.4040603@smugmug.com> Message-ID: <20040713235519.GA11119@potassium.msp.redhat.com> On Tue, Jul 13, 2004 at 03:22:51PM -0700, Don MacAskill wrote: > Does GFS somehow get around the 1TB block device issue? Just how large > can a single exported filesystem be with GFS? On Linux 2.4-based kernels, the limit is 1TB. On 2.6-based kernels, the limit is 8TB on 32-bit systems and some really large number (at least exabytes) on 64-bit systems. > Our current (homegrown) solution will scale very well for quite some > time, but eventually we're going to get saturated with write requests to > individual head units. Does GFS intelligently "spread the load" among > multiple storage entities for writing under high load? Does it always > write to any available storage units, or are there thresholds where it > expands the pool of units it writes to? (I'm not sure I'm making much > sense, but we'll see if any of you grok it :) Our current allocation methods try to allocate from areas of the disk where there isn't much contention for the allocation bitmap locks. It doesn't know anything about spreading load on the basis of disk load. (That would be an interesting thing to add, but we don't have any plans to do so for the short term.) > In the event of some multiple-catastrophe failure (where some data isn't > online at all, let alone redundant), how graceful is GFS? Does it "rope > off" the data that's not available and still allow full access to the > data that is? Or does the whole cluster go down? Right now, a malfunctioning or non-present disk can cause the whole cluster to go down. That's assuming the error isn't masked by hardware RAID or CLVM mirroing (when we get there). One of the next projects on my plate is fixing the filesystem so that a node will gracefully withdraw itself from the cluster when it sees a malfunctioning storage device. Each node will stay up and could potentially be able to continue accessing other GFS filesystems on other storage devices. I/We haven't thought much about trying to get GFS to continue to function when only part of a filesystem is present. > I notice the pricing for GFS is $2200. Is that per seat? And if so, > what's a "seat"? Each client? Each server with storage participating > in the cluster? Both? Some other distinction? I'm not a marketing/sales person, just a code monkey, so take this with a grain of salt: It's per node running the filesystem. I don't think machines running GULM lock servers or GNBD block servers count as machine that need to be paid for. > Is AS a prereq for clients? Servers? Both? Or will ES and WS boxes be > able to participate as well? According to the web page, you should be able to add a GFS entitlement to all RHEL trimlines (WS, ES, and AS). http://www.redhat.com/apps/commerce/rha/gfs/ -- Ken Preslan From ebpeele2 at pams.ncsu.edu Wed Jul 14 02:18:37 2004 From: ebpeele2 at pams.ncsu.edu (Elliot Peele) Date: Tue, 13 Jul 2004 22:18:37 -0400 Subject: [Linux-cluster] GFS limits? In-Reply-To: <20040713235519.GA11119@potassium.msp.redhat.com> References: <40F460BB.4040603@smugmug.com> <20040713235519.GA11119@potassium.msp.redhat.com> Message-ID: <1089771517.11645.8.camel@localhost.localdomain> On Tue, 2004-07-13 at 18:55 -0500, Ken Preslan wrote: > On Tue, Jul 13, 2004 at 03:22:51PM -0700, Don MacAskill wrote: > > Does GFS somehow get around the 1TB block device issue? Just how large > > can a single exported filesystem be with GFS? > > On Linux 2.4-based kernels, the limit is 1TB. On 2.6-based kernels, the > limit is 8TB on 32-bit systems and some really large number (at least > exabytes) on 64-bit systems. The file system size limit under 2.4 is 2TB, this can be changed to 4TB if your kernel has the LBD (Large Block Device) patches. Really to only change is using a unsigned int instead of a signed int. There are rpms for GFS for 2.4.21-15.EL. I have kernel packages for 2.4.21-15.EL that have xfs and lbd patches if you want them. Elliot -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From don at smugmug.com Wed Jul 14 03:17:07 2004 From: don at smugmug.com (Don MacAskill) Date: Tue, 13 Jul 2004 20:17:07 -0700 Subject: [Linux-cluster] GFS limits? In-Reply-To: References: <40F460BB.4040603@smugmug.com> Message-ID: <40F4A5B3.7010806@smugmug.com> Brian Jackson wrote: > > The code that most people on this list are interested in currently is > the code in cvs which is for 2.6 only. 2.6 has a config option to > enable using devices larger than 2TB. I'm still reading through all > the GFS code, but it's still architecturally the same as when it was > closed source, so I'm pretty sure most of my knowledge from OpenGFS > will still apply. GFS uses 64bit values internally, so you can have > very large filesystems (larger than PBs). > This is nice. I was specifically thinking of 64bit machines, in which case, I'd expect it to be 9EB or something. > >>Our current (homegrown) solution will scale very well for quite some >>time, but eventually we're going to get saturated with write requests to >>individual head units. Does GFS intelligently "spread the load" among >>multiple storage entities for writing under high load? > > > No, each node that mounts has direct access to the storage. It writes > just like any other fs, when it can. > So, if I have a dozen seperate arrays in a given cluster, it will write data linearly to array #1, then array #2, then array #3? If that's the case, GFS doesn't solve my biggest fear - write performance with a huge influx of data. I'd hoped it might somehow "stripe" the data across individual units so that we can aggregate the combined interface bandwidth to some extent. > >>Does it always >>write to any available storage units, or are there thresholds where it >>expands the pool of units it writes to? (I'm not sure I'm making much >>sense, but we'll see if any of you grok it :) > > > I think you may have a little misconception about just what GFS is. > You should check the WHATIS_OpenGFS doc at > http://opengfs.sourceforge.net/docs.php It says OpenGFS, but for the > most part, the same stuff applies to GFS. > I've read it, and quite a few other documents and whitepapers on GFS quite a few times, but perhaps you're right - I must be missing something. More on this below... >>I notice the pricing for GFS is $2200. Is that per seat? And if so, >>what's a "seat"? Each client? Each server with storage participating >>in the cluster? Both? Some other distinction? > > > Now I definitely know you have some misconception. GFS doesn't have > any concept of server and client. All nodes mount the fs directly > since they are all directly connected to the storage. > Hmm, yes, this is probably my sticking point. It was my understanding (or maybe just my hope?) that servers could participate as "storage units" in the cluster by exporting their block devices, in addition to FC or iSCSI or whatever devices which aren't techincally 'servers'. In other words, I was thinking/hoping that the cluster consisted of block units aggregated into a filesystem, and that the filesystem could consist of FC RAID devices, iSCSI solutions, and "dumb servers" that just exported their local disks to the cluster FS. Am I totally wrong? I guess it's GNDB I don't totally understand, so I'd better go read up on it. Thanks, Don -------------- next part -------------- A non-text attachment was scrubbed... Name: don.vcf Type: text/x-vcard Size: 253 bytes Desc: not available URL: From don at smugmug.com Wed Jul 14 03:25:35 2004 From: don at smugmug.com (Don MacAskill) Date: Tue, 13 Jul 2004 20:25:35 -0700 Subject: [Linux-cluster] GFS limits? In-Reply-To: <20040713235519.GA11119@potassium.msp.redhat.com> References: <40F460BB.4040603@smugmug.com> <20040713235519.GA11119@potassium.msp.redhat.com> Message-ID: <40F4A7AF.7030009@smugmug.com> Ken Preslan wrote: > > > Our current allocation methods try to allocate from areas of the disk > where there isn't much contention for the allocation bitmap locks. It > doesn't know anything about spreading load on the basis of disk load. > (That would be an interesting thing to add, but we don't have any plans > to do so for the short term.) > > My use case isn't very standard. Rather than needing tons of read/write random access all over the disk, we're almost completely linear write-once-per-file, read-many operations. We do photo sharing and storage. So lots and lots of photos get uploaded, and they're serially stored on disk. Once they're on disk, though, they're rarely modified. Just read. It's forseeable in the future, though, to where we can't push these linear writes to disk fast enough as people upload photos. Either the interface (GigE, iSCSI, Fibre Channel) isn't fast enough or whatever. It's way out in the future, but it'll come faster than I like to think about. In that case, we need a nice way to spread those writes across multiple disks/servers/whatever. GigE bonding might solve it temporarily, but that can only last so far. Ideally, I want to scale horizontally (tons of cheap linx boxes attached to big disks) and have the writes "passed out" among those boxes. If I have to write my own stuff to do that, fine. But if GFS can potentially provide something along those lines down the road, great. >>In the event of some multiple-catastrophe failure (where some data isn't >>online at all, let alone redundant), how graceful is GFS? Does it "rope >>off" the data that's not available and still allow full access to the >>data that is? Or does the whole cluster go down? > > > Right now, a malfunctioning or non-present disk can cause the whole > cluster to go down. That's assuming the error isn't masked by hardware > RAID or CLVM mirroing (when we get there). > > One of the next projects on my plate is fixing the filesystem so that a > node will gracefully withdraw itself from the cluster when it sees a > malfunctioning storage device. Each node will stay up and could > potentially be able to continue accessing other GFS filesystems on > other storage devices. > > I/We haven't thought much about trying to get GFS to continue to function > when only part of a filesystem is present. > When I'm talking about petabytes, this weighs on my mind heavily. I can't have some power outage take out a couple of nodes which may have both sets of "redundant data" for, say, 10TB, take down a 20PB cluster. I realize 20PB sounds fairly ridiculous at the moment, but I can see it coming. And it's a management nightmare when it's spread across small 1TB block devices all over the place instead of an aggregate volume. I'm sure it's a software nightmare to think of the aggregate volume, but that's not my problem. :) > >>I notice the pricing for GFS is $2200. Is that per seat? And if so, >>what's a "seat"? Each client? Each server with storage participating >>in the cluster? Both? Some other distinction? > > > I'm not a marketing/sales person, just a code monkey, so take this with > a grain of salt: It's per node running the filesystem. I don't think > machines running GULM lock servers or GNBD block servers count as machine > that need to be paid for. > Looks like I have more reading to do, since apparently I don't totally get what a GNDB block server is. Or a GULM lock server, for that matter. > >>Is AS a prereq for clients? Servers? Both? Or will ES and WS boxes be >>able to participate as well? > > > According to the web page, you should be able to add a GFS entitlement to > all RHEL trimlines (WS, ES, and AS). > > http://www.redhat.com/apps/commerce/rha/gfs/ > Thanks! Don -------------- next part -------------- A non-text attachment was scrubbed... Name: don.vcf Type: text/x-vcard Size: 253 bytes Desc: not available URL: From don at smugmug.com Wed Jul 14 03:42:48 2004 From: don at smugmug.com (Don MacAskill) Date: Tue, 13 Jul 2004 20:42:48 -0700 Subject: [Linux-cluster] GFS limits? In-Reply-To: <1089771517.11645.8.camel@localhost.localdomain> References: <40F460BB.4040603@smugmug.com> <20040713235519.GA11119@potassium.msp.redhat.com> <1089771517.11645.8.camel@localhost.localdomain> Message-ID: <40F4ABB8.2000206@smugmug.com> Elliot Peele wrote: > On Tue, 2004-07-13 at 18:55 -0500, Ken Preslan wrote: > >>On Tue, Jul 13, 2004 at 03:22:51PM -0700, Don MacAskill wrote: >> >>>Does GFS somehow get around the 1TB block device issue? Just how large >>>can a single exported filesystem be with GFS? >> >>On Linux 2.4-based kernels, the limit is 1TB. On 2.6-based kernels, the >>limit is 8TB on 32-bit systems and some really large number (at least >>exabytes) on 64-bit systems. > > > The file system size limit under 2.4 is 2TB, this can be changed to 4TB > if your kernel has the LBD (Large Block Device) patches. Really to only > change is using a unsigned int instead of a signed int. > > There are rpms for GFS for 2.4.21-15.EL. I have kernel packages for > 2.4.21-15.EL that have xfs and lbd patches if you want them. > > Elliot I'd love to take a look at the LBD patches, yes. I've currently got systems with 2 1.2TB filesystems attached, and I'd really like to use md or LVM or something to combine them to be one fs. But that goes beyond the 2TB limit.... :) I'm on 2.4.21-15.0.3.EL right now, but I can hop back a revision to play with this. I wish we could use XFS, but until RH supports it, I'm afraid it's a no-go. Sucks, too, since I had to migrate many TBs of storage from XFS to ext3 when we moved from SuSE Enterprise to RHEL3. What a pain... Thanks! Don -------------- next part -------------- A non-text attachment was scrubbed... Name: don.vcf Type: text/x-vcard Size: 253 bytes Desc: not available URL: From notiggy at gmail.com Wed Jul 14 03:59:59 2004 From: notiggy at gmail.com (Brian Jackson) Date: Tue, 13 Jul 2004 22:59:59 -0500 Subject: [Linux-cluster] GFS limits? In-Reply-To: <40F4A5B3.7010806@smugmug.com> References: <40F460BB.4040603@smugmug.com> <40F4A5B3.7010806@smugmug.com> Message-ID: On Tue, 13 Jul 2004 20:17:07 -0700, Don MacAskill wrote: > > > Brian Jackson wrote: > > > > > The code that most people on this list are interested in currently is > > the code in cvs which is for 2.6 only. 2.6 has a config option to > > enable using devices larger than 2TB. I'm still reading through all > > the GFS code, but it's still architecturally the same as when it was > > closed source, so I'm pretty sure most of my knowledge from OpenGFS > > will still apply. GFS uses 64bit values internally, so you can have > > very large filesystems (larger than PBs). > > > > This is nice. I was specifically thinking of 64bit machines, in which > case, I'd expect it to be 9EB or something. > > > > >>Our current (homegrown) solution will scale very well for quite some > >>time, but eventually we're going to get saturated with write requests to > >>individual head units. Does GFS intelligently "spread the load" among > >>multiple storage entities for writing under high load? > > > > > > No, each node that mounts has direct access to the storage. It writes > > just like any other fs, when it can. > > > > So, if I have a dozen seperate arrays in a given cluster, it will write > data linearly to array #1, then array #2, then array #3? If that's the > case, GFS doesn't solve my biggest fear - write performance with a huge > influx of data. I'd hoped it might somehow "stripe" the data across > individual units so that we can aggregate the combined interface > bandwidth to some extent. That's not the job of the filesystem, that should be done at the block layer with clvm/evms2/etc. > > > > > >>Does it always > >>write to any available storage units, or are there thresholds where it > >>expands the pool of units it writes to? (I'm not sure I'm making much > >>sense, but we'll see if any of you grok it :) > > > > > > I think you may have a little misconception about just what GFS is. > > You should check the WHATIS_OpenGFS doc at > > http://opengfs.sourceforge.net/docs.php It says OpenGFS, but for the > > most part, the same stuff applies to GFS. > > > > I've read it, and quite a few other documents and whitepapers on GFS > quite a few times, but perhaps you're right - I must be missing > something. More on this below... > > > >>I notice the pricing for GFS is $2200. Is that per seat? And if so, > >>what's a "seat"? Each client? Each server with storage participating > >>in the cluster? Both? Some other distinction? > > > > > > Now I definitely know you have some misconception. GFS doesn't have > > any concept of server and client. All nodes mount the fs directly > > since they are all directly connected to the storage. > > > > Hmm, yes, this is probably my sticking point. It was my understanding > (or maybe just my hope?) that servers could participate as "storage > units" in the cluster by exporting their block devices, in addition to > FC or iSCSI or whatever devices which aren't techincally 'servers'. You can technically use anything that the kernel sees as a block device, but I'd hesitate to put gnbd (and a few other solutions) into a production environment currently. > > In other words, I was thinking/hoping that the cluster consisted of > block units aggregated into a filesystem, and that the filesystem could > consist of FC RAID devices, iSCSI solutions, and "dumb servers" that > just exported their local disks to the cluster FS. Like I said, you can techincally do it, but it's not the filesystems job, that should all happen at the block layer. > > Am I totally wrong? I guess it's GNDB I don't totally understand, so > I'd better go read up on it. GNBD is just a way to export a block device to another host over a network (similar in concept to iSCSI/HyperSCSI) --Brian > > Thanks, > > Don > > > > From ebpeele2 at pams.ncsu.edu Wed Jul 14 04:36:20 2004 From: ebpeele2 at pams.ncsu.edu (Elliot Peele) Date: Wed, 14 Jul 2004 00:36:20 -0400 Subject: [Linux-cluster] GFS limits? In-Reply-To: <40F4ABB8.2000206@smugmug.com> References: <40F460BB.4040603@smugmug.com> <20040713235519.GA11119@potassium.msp.redhat.com> <1089771517.11645.8.camel@localhost.localdomain> <40F4ABB8.2000206@smugmug.com> Message-ID: <1089779780.11645.14.camel@localhost.localdomain> On Tue, 2004-07-13 at 20:42 -0700, Don MacAskill wrote: > I'd love to take a look at the LBD patches, yes. I've currently got > systems with 2 1.2TB filesystems attached, and I'd really like to use md > or LVM or something to combine them to be one fs. But that goes beyond > the 2TB limit.... :) You can find my rpms at: ftp://mirror.physics.ncsu.edu/pub/contrib/ebpeele2/cls_xfs and gfs rpms for that kernel at: ftp://mirror.physics.ncsu.edu/pub/contrib/ebpeele2/cls_gfs Elliot -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From hlawatschek at atix.de Wed Jul 14 09:48:19 2004 From: hlawatschek at atix.de (Mark Hlawatschek) Date: Wed, 14 Jul 2004 11:48:19 +0200 Subject: [Linux-cluster] GNBD, how good it is ? Message-ID: <1089798498.5012.7.camel@falballa.gallien.atix> Hi, you'll find our iSCSI target server based on Intel's iSCSI reference implementation near http://www.atix.de/iscsi-target Brian Jackson wrote: > > > iSCSI and HyperSCSI both work with GFS, so those are options. I > > suppose you'd be better off answering the question of whether they are > > stable enough for you. > > Speaking of iSCSI, is anyone aware of a GPL Linux 2.6 iSCSI target? > -- Gruss / Regards, Dipl.-Ing. Mark Hlawatschek ** ATIX - Ges. fuer Informationstechnologie und Consulting mbh Einsteinstr. 10 D-85716 Unterschleissheim Company HomePage: www.atix.de SAN Division : www.san-time.com From lhh at redhat.com Wed Jul 14 15:20:04 2004 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 14 Jul 2004 11:20:04 -0400 Subject: [Linux-cluster] node failing In-Reply-To: <200407141148292.SM01912@rbrown> References: <200407141148292.SM01912@rbrown> Message-ID: <1089818404.31623.88.camel@atlantis.boston.redhat.com> On Wed, 2004-07-14 at 11:48 +1200, Royce Brown wrote: > I am trying to track down a problem I?ll been having with the > clustering software on redhat 3.0 (supplied rpm?s). This would be taroon-list material, actually. > I am running a 2 cluster node using Multicast Heartbeat, Network > Tiebreaker IP address and have bonded Ethernet interfaces to different > switches. Good. Try running in HA-bonded/failover mode if you're not already. > There is no networking problems that I can see. On the bad node I can > ping the other node by it?s address and the multicast address. I have > full debug mode on, but the log files don?t show anything. You should file a support ticket with Red Hat Support: http://www.redhat.com/apps/support > Has any one else seen this problem or can give me some tips what to > look at next ? Try the latest package from the RHN beta channel if you have access to it, it fixes a problem which causes membership to enter an infinite loop in some cases where timeouts occurred. The infinite loop causes multiple clumembd (or cluquorumd) processes to appear. Here's a ref to the bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126316 -- Lon From cknowlton at science.edu Wed Jul 14 16:08:10 2004 From: cknowlton at science.edu (Carlos Knowlton) Date: Wed, 14 Jul 2004 11:08:10 -0500 Subject: [Linux-cluster] GFS limits? In-Reply-To: <20040713235519.GA11119@potassium.msp.redhat.com> References: <40F460BB.4040603@smugmug.com> <20040713235519.GA11119@potassium.msp.redhat.com> Message-ID: <40F55A6A.4010809@science.edu> Ken Preslan wrote: >On Tue, Jul 13, 2004 at 03:22:51PM -0700, Don MacAskill wrote: > > >>Does GFS somehow get around the 1TB block device issue? Just how large >>can a single exported filesystem be with GFS? >> >> > >On Linux 2.4-based kernels, the limit is 1TB. On 2.6-based kernels, the >limit is 8TB on 32-bit systems and some really large number (at least >exabytes) on 64-bit systems. > > On the 2.6 kernel, I've heard that the Ext2/3fs can handle 16TB by increasing the block size (I'm not sure if that number is with 4K or 8K blocks). Is it possible to increase the FS capacity in GFS on 32bit systems in the same way (ie, by increasing the block size)? If so, what is the maximum block size supported in GFS Thanks! Carlos From kpreslan at redhat.com Wed Jul 14 17:18:17 2004 From: kpreslan at redhat.com (Ken Preslan) Date: Wed, 14 Jul 2004 12:18:17 -0500 Subject: [Linux-cluster] GFS limits? In-Reply-To: <40F55A6A.4010809@science.edu> References: <40F460BB.4040603@smugmug.com> <20040713235519.GA11119@potassium.msp.redhat.com> <40F55A6A.4010809@science.edu> Message-ID: <20040714171817.GA14278@potassium.msp.redhat.com> On Wed, Jul 14, 2004 at 11:08:10AM -0500, Carlos Knowlton wrote: > >On Linux 2.4-based kernels, the limit is 1TB. On 2.6-based kernels, the > >limit is 8TB on 32-bit systems and some really large number (at least > >exabytes) on 64-bit systems. > > > > > On the 2.6 kernel, I've heard that the Ext2/3fs can handle 16TB by > increasing the block size (I'm not sure if that number is with 4K or 8K > blocks). > > Is it possible to increase the FS capacity in GFS on 32bit systems in > the same way (ie, by increasing the block size)? If so, what is the > maximum block size supported in GFS The difference in quoted max size limits (1 or 2 TB on 2.4 and 8 or 16 TB on 2.6) shows up because some people trust block/page numbers that use the sign bit and some don't. It may very well be possible to go to larger sizes on your hardware and drivers, but you need to check to verify that yourself. I'm paranoid and will quote you the smaller number. :-) -- Ken Preslan From deks at sbcglobal.net Wed Jul 14 18:35:21 2004 From: deks at sbcglobal.net (Dexter Eugenio) Date: Wed, 14 Jul 2004 11:35:21 -0700 (PDT) Subject: [Linux-cluster] GFS configuration help Message-ID: <20040714183521.92044.qmail@web81704.mail.yahoo.com> Hi, Need your help for my proposed setup below. 1. machine1 is the host server that is connected to the SAN, mounts the filesystem. This machine has read-write capability on the filesystem. 2. machine2, machine3, machine4 etc.. are the servers that mounts the same filesystem with read only capability. they are connected to a fiber switch. 3. future machines will be connected to the switch to mount the same filesystem as read only. Upon reading various docs, it says that i have to configure clustering? I'm not sure if that is needed in my setup. All i want to do is to emulate NFS capability, but instead of using the network, my machines are connected directly to the SAN. I might be wrong? Btw, i'm running RH ES 3.0 and has the GFS rpms from Redhat. I've installed it fine but I have no idea on how to configure it. I hope you can help me with any information you can give. Regards, Deks From danderso at redhat.com Wed Jul 14 18:48:28 2004 From: danderso at redhat.com (Derek Anderson) Date: Wed, 14 Jul 2004 13:48:28 -0500 Subject: [Linux-cluster] GFS configuration help In-Reply-To: <20040714183521.92044.qmail@web81704.mail.yahoo.com> References: <20040714183521.92044.qmail@web81704.mail.yahoo.com> Message-ID: <200407141348.28153.danderso@redhat.com> Deks, The GFS 6.0.0 Administrators Guide should walk you through everything you need to do: http://www.redhat.com/docs/manuals/csgfs/admin-guide/ On Wednesday 14 July 2004 13:35, Dexter Eugenio wrote: > Hi, > > Need your help for my proposed setup below. > > 1. machine1 is the host server that is connected to the SAN, mounts the > filesystem. This machine has read-write capability on the filesystem. > 2. machine2, machine3, machine4 etc.. are the servers that mounts the same > filesystem with read only capability. they are connected to a fiber switch. > 3. future machines will be connected to the switch to mount the same > filesystem as read only. > > Upon reading various docs, it says that i have to configure clustering? I'm > not sure if that is needed in my setup. All i want to do is to emulate NFS > capability, but instead of using the network, my machines are connected > directly to the SAN. I might be wrong? > > Btw, i'm running RH ES 3.0 and has the GFS rpms from Redhat. I've installed > it fine but I have no idea on how to configure it. > > I hope you can help me with any information you can give. > > Regards, > Deks > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From kpreslan at redhat.com Wed Jul 14 18:46:45 2004 From: kpreslan at redhat.com (Ken Preslan) Date: Wed, 14 Jul 2004 13:46:45 -0500 Subject: [Linux-cluster] GFS Performance In-Reply-To: <91C4F1A7C418014D9F88E938C13554584B2D5C@mwjdc2.mweb.com> References: <91C4F1A7C418014D9F88E938C13554584B2D5C@mwjdc2.mweb.com> Message-ID: <20040714184645.GA15045@potassium.msp.redhat.com> It's not the directory that that's causing the slowness, but the fact that the "ls -la" tries to a stat() on the file that's being written to by node 1. Node 1 has to sync out all the dirty data in its cache before it can release the lock to node 2. This can take a while if Node 1 has a big (and full) cache. You can do a ls without the -l option, so it won't stat() the files in the directory. That should be faster. The ultimate solution is to add buffer forwarding to GFS, so node 1 can give node 2 stat() information without having to flush all its data. But that's a ways off. On Thu, Jul 08, 2004 at 02:27:38PM +0200, Richard Mayhew wrote: > Hi > > > I setup 2 nodes, on my EMC SAN. Both nodes see the storage and can > access the cca device. > When writing a file to the storage fs, the second node takes a couple of > seconds to see the changes. > > Ie. > 1. Node 1 Creates the file "dd if=/dev/zero of=test.file bs=4096 > count=10240000" > 2. Doing a ls -la on node 2 takes a few seconds to display the contents > of the dir. > > After the file has finished being updates, all listings of that dir are > quick, but if any changes are made, one again has to wait for the system > to display the contents of the dir. > > Any idea? > > > > -- > > Regards > > Richard Mayhew > Unix Specialist > > MWEB Business > Tel: + 27 11 340 7200 > Fax: + 27 11 340 7288 > Website: www.mwebbusiness.co.za > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Ken Preslan From ninja at slaphack.com Thu Jul 15 03:36:05 2004 From: ninja at slaphack.com (David Masover) Date: Wed, 14 Jul 2004 22:36:05 -0500 Subject: [Linux-cluster] (gfs or coda) and reiser4 Message-ID: <40F5FBA5.3070808@slaphack.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've been having trouble finding a good distributed FS. All I want is one server with a ton of space and some aggressive client-side caching - -- so, basically, CODA (or maybe GFS). But I like the performance of Reiser4, and I like the InterMezzo concept, where the local cache might even be used as an ordinary filesystem in a pinch. Also, by God, on the server side, Samba and NFS are the two easiest to set up, in that order. I should NOT need a dedicated partition -- it should just access local files. I like the speed of reiser4 (among other things), so I want to use that for the data storage and cache. What I really want is a working implementation of InterMezzo on Linux 2.6, but that isn't going to happen. It looks like the only way to do what I want here is to code something myself. Is there a better way? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iQIVAwUBQPX7pHgHNmZLgCUhAQJgHg/+LtxH0lXbfZ+2eP2CEn84hLUZqUWYR81n 3qugRt+jW73/RssQJwEMZymjGwZqZvKP7T4nk1wSjSNo5okiqIHgH8wKCgQtMqzS RZlNAUxFbs2Z4sah07i2Tqt8rRKbM2ppT4I11WCcPJ5MD6u2mI/pmGJZE1XjGSz/ iygz+PHEbUqegwbnn2ayHC3oc1YZXDwGxDZjdjPUdWylU152RT8BfauaIclsrVTJ bVck9Uofax6aDkxCBgE811/ePTnmm8Hwf02V2aIFrPg9qZkXXK+zBH9R61nTtyzV SI8E1yGcjvYXe/0ywomnxYuis2A8M/x4Yv/0A5zLmgngu6x3bVzKqBn8HijedWc5 H9i5KZVrFMzYg+y4QMAb1EHHOPeFAt2yI5w2S+qaZ6rcmiExvgKuwHIUqnPH0hRp GWuF6jbpMgN9PsqueqXQO4rU8D72skwx2K+P77juOXiB5lryXelfTN05VfsULkdy oDa1w5xViUsAq0JVozi7k615eSMoKFVHmU/9CRO3nUKMZdA3iMaXpv1BMJf+j4mL 0PMgoHZTGAbYhUdj6V+Uab4arOX6Nwk19Ff4R4UFgZVEfYgQH8/3zpxNhvKm117T 2zP40mByFH6e+PkIpulBvciUWdtqaH9sqm8ppfgErQZpGW6lTGok/I8dWrJEz2O7 94mmoI/HtyQ= =WnAk -----END PGP SIGNATURE----- From rmayhew at mweb.com Thu Jul 15 09:38:02 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Thu, 15 Jul 2004 11:38:02 +0200 Subject: [Linux-cluster] GFS configuration help Message-ID: <91C4F1A7C418014D9F88E938C1355458609CAC@mwjdc2.mweb.com> Hi, Ill send you the documentation away from this list (its 5MB) -----Original Message----- From: Dexter Eugenio [mailto:deks at sbcglobal.net] Sent: 14 July 2004 08:35 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] GFS configuration help Hi, Need your help for my proposed setup below. 1. machine1 is the host server that is connected to the SAN, mounts the filesystem. This machine has read-write capability on the filesystem. 2. machine2, machine3, machine4 etc.. are the servers that mounts the same filesystem with read only capability. they are connected to a fiber switch. 3. future machines will be connected to the switch to mount the same filesystem as read only. Upon reading various docs, it says that i have to configure clustering? I'm not sure if that is needed in my setup. All i want to do is to emulate NFS capability, but instead of using the network, my machines are connected directly to the SAN. I might be wrong? Btw, i'm running RH ES 3.0 and has the GFS rpms from Redhat. I've installed it fine but I have no idea on how to configure it. I hope you can help me with any information you can give. Regards, Deks -- Linux-cluster mailing list Linux-cluster at redhat.com http://www.redhat.com/mailman/listinfo/linux-cluster From jake.gold-gfs at hypermediasystems.com Thu Jul 15 18:21:20 2004 From: jake.gold-gfs at hypermediasystems.com (Jake Gold) Date: Thu, 15 Jul 2004 11:21:20 -0700 Subject: [Linux-cluster] GFS configuration help In-Reply-To: <91C4F1A7C418014D9F88E938C1355458609CAC@mwjdc2.mweb.com> References: <91C4F1A7C418014D9F88E938C1355458609CAC@mwjdc2.mweb.com> Message-ID: <20040715112120.1b63fbd2.jake.gold-gfs@hypermediasystems.com> All, Are there any special concerns or steps when using one read-write node and many read-only nodes? In this scenerio do you still have to setup all the usual components in the same way (locking, fencing, ...) ? Can anything be left out when you only have one node doing writes? How many people are doing this? Anyone know of any how-tos/documents regarding this specific configuration? Thanks to everyone at Sistina and Red Hat for all their hard work on GFS! Thanks, Jake On Thu, 15 Jul 2004 11:38:02 +0200 "Richard Mayhew" wrote: > Hi, > Ill send you the documentation away from this list (its 5MB) > > -----Original Message----- > From: Dexter Eugenio [mailto:deks at sbcglobal.net] > Sent: 14 July 2004 08:35 PM > To: linux-cluster at redhat.com > Subject: [Linux-cluster] GFS configuration help > > Hi, > > Need your help for my proposed setup below. > > 1. machine1 is the host server that is connected to the SAN, mounts the > filesystem. This machine has read-write capability on the filesystem. > 2. machine2, machine3, machine4 etc.. are the servers that mounts the > same filesystem with read only capability. they are connected to a fiber > switch. > 3. future machines will be connected to the switch to mount the same > filesystem as read only. > > Upon reading various docs, it says that i have to configure clustering? > I'm not sure if that is needed in my setup. All i want to do is to > emulate NFS capability, but instead of using the network, my machines > are connected directly to the SAN. I might be wrong? > > Btw, i'm running RH ES 3.0 and has the GFS rpms from Redhat. I've > installed it fine but I have no idea on how to configure it. > > I hope you can help me with any information you can give. > > Regards, > Deks > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From ed.mann at choicepoint.com Thu Jul 15 14:39:35 2004 From: ed.mann at choicepoint.com (Edward Mann) Date: Thu, 15 Jul 2004 09:39:35 -0500 Subject: [Linux-cluster] gfs 6.0 Firewire Message-ID: <1089902374.24755.19.camel@storm.cp-direct.com> Hello, I am using RedHat Linux Enterprise Server 3. GFS 6.0 and a firewire drive. I have done the setup and created all the files, created the gfs drive with gfs_mkfs. But when i go to mount it the mount just hangs and never returns any errors. I have let it run all night and it still has not mounted the file system. I have made sure that all my modules are installed that were listed in the docs. And all other programs are running. ccsd is running lock_gulmd is running. All i have are two machines that need to share storage off 1 drive. I am using the fence_manual. I hope that i am using it right. This is what my fence.ccs file looks like fence_devices { admin { agent = "fence_manual" } } Is this setup right? Any help would be appreciated. Thanks. From linux-cluster-subscription at swapoff.org Fri Jul 16 04:44:37 2004 From: linux-cluster-subscription at swapoff.org (Alec Thomas) Date: Fri, 16 Jul 2004 14:44:37 +1000 Subject: [Linux-cluster] GFS limits? In-Reply-To: <40F4A7AF.7030009@smugmug.com> References: <40F460BB.4040603@smugmug.com> <20040713235519.GA11119@potassium.msp.redhat.com> <40F4A7AF.7030009@smugmug.com> Message-ID: <20040716044437.GA22945@swapoff.org> > It's forseeable in the future, though, to where we can't push these > linear writes to disk fast enough as people upload photos. Either the > interface (GigE, iSCSI, Fibre Channel) isn't fast enough or whatever. > It's way out in the future, but it'll come faster than I like to think > about. > > In that case, we need a nice way to spread those writes across multiple > disks/servers/whatever. GigE bonding might solve it temporarily, but > that can only last so far. > Don, It seems you want something more like Lustre (http://www.lustre.org/): "The central target in this project is the development of Lustre, a next-generation cluster file system which can serve clusters with 10,000's of nodes, petabytes of storage, move 100's of GB/sec with state of the art security and management infrastructure." Alec -- Evolution: Taking care of those too stupid to take care of themselves. From mauelshagen at redhat.com Thu Jul 15 20:35:53 2004 From: mauelshagen at redhat.com (Heinz Mauelshagen) Date: Thu, 15 Jul 2004 22:35:53 +0200 Subject: [Linux-cluster] *** Announcement: dmraid 1.0.0-rc2 *** Message-ID: <20040715203553.GA18616@redhat.com> *** Announcement: dmraid 1.0.0-rc2 *** Following a good tradition, dmraid 1.0.0-rc2 is available at http://people.redhat.com:/~heinzm/sw/dmraid/ in source and i386 rpm, before I leave for a 2 weeks vacation trip followed by LWE ;) Won't read my email before July, 30th. dmraid (Device-Mapper Raid tool) discovers, [de]activates and displays properties of software RAID sets (ie. ATARAID) and contained DOS partitions using the device-mapper runtime of the 2.6 kernel. The following ATARAID types are supported on Linux 2.6: Highpoint HPT37X Highpoint HPT45X Intel Software RAID Promise FastTrack Silicon Image Medley This ATARAID type can be discovered only in this version: LSI Logic MegaRAID Please provide insight to support those metadata formats completely. Thanks. See files README and CHANGELOG, which come with the source tarball for prerequisites to run this software, further instructions on installing and using dmraid! CHANGELOG is contained below for your convenience as well. Call for testers: ----------------- I need testers with the above ATARAID types, to check that the mapping created by this tool is correct (see options "-t -ay") and access to the ATARAID data is proper. You can activate your ATARAID sets without danger of overwriting your metadata, because dmraid accesses it read-only unless you use option -E with -r in order to erase ATARAID metadata (see 'man dmraid')! This is a release candidate version so you want to have backups of your valuable data *and* you want to test accessing your data read-only first in order to make sure that the mapping is correct before you go for read-write access. The author is reachable at . Later, I told you ;) For test results, mapping information, discussions, questions, patches, enhancement requests and the like, please subscribe and mail to . CHANGELOG: --------- Changelog from dmraid 1.0.0-rc1 to 1.0.0-rc2 2004.07.15 o Intel Software RAID discovery and activation support o allow more than one format handler name with --format o display "raid10" sets properly rather than just "mirror" o enhanced activate.c to handle partial activation of sets (eg, degraded RAID0) o enhanced command line option checks o implemented a library context for variables such as debug etc. o fixed memory leak in discover_partitions o fixed recursion in _find_set() o continued writing subsets in case we fail on one because of RAID1 o format handler template update o fixed dietlibc build o fixed shared library configure o use default_list_set() instead of &raid_sets where possible o name change of list_head members to the more commonly used 'list' o renamed msdos partition format handler to dos o lots of inline comments corrected/updated o streamlined tools/*.[ch] o moved get.*level() and get_status to metadata.[ch] and changed level name to type -- Regards, Heinz -- The LVM Guy -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Heinz Mauelshagen Red Hat GmbH Consulting Development Engineer Am Sonnenhang 11 56242 Marienrachdorf Germany Mauelshagen at RedHat.com +49 2626 141200 FAX 924446 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- From d.lelli at surrey.ac.uk Fri Jul 16 13:18:29 2004 From: d.lelli at surrey.ac.uk (Diego) Date: Fri, 16 Jul 2004 14:18:29 +0100 Subject: [Linux-cluster] Cluster admin Message-ID: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08> Hello everibody, I built a linux cluster running the RH 9. I'd like to know if is there any tool for the general cluster administration, such as add a new user to all the node, put a retrieve fil from the nodes and so on . I had a look to OSCAR, but I don't want to installl again all the machine that are already set-up to go. Many Thanks Diego --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.720 / Virus Database: 476 - Release Date: 14/07/2004 -------------- next part -------------- An HTML attachment was scrubbed... URL: From anton at hq.310.ru Fri Jul 16 08:47:49 2004 From: anton at hq.310.ru (=?Windows-1251?B?wO3y7u0gzeX17vDu+Oj1?=) Date: Fri, 16 Jul 2004 12:47:49 +0400 Subject: [Linux-cluster] patch for gfs, add suiddir option for mount Message-ID: <61425687.20040716124749@hq.310.ru> Hi linux-cluster, I apologize for the previous letter, here full patch suiddir option for mount man 8 mount (FreeBSD) *** gfs_ioctl.h.orig 2004-07-14 13:54:39.000000000 +0400 --- gfs_ioctl.h 2004-07-14 13:57:38.000000000 +0400 *************** *** 213,218 **** --- 213,219 ---- unsigned int ar_num_glockd; int ar_posixacls; /* Enable posix acls */ + int ar_suiddir; /* suiddir support */ }; #endif /* ___GFS_IOCTL_DOT_H__ */ *** inode.c.orig 2004-07-15 19:52:33.000000000 +0400 --- inode.c 2004-07-15 19:55:36.000000000 +0400 *************** *** 1132,1138 **** struct posix_acl *acl = NULL; struct gfs_alloc *al; struct gfs_inode *ip; ! unsigned int gid; int alloc_required; int error; --- 1132,1138 ---- struct posix_acl *acl = NULL; struct gfs_alloc *al; struct gfs_inode *ip; ! unsigned int gid, uid; int alloc_required; int error; *************** *** 1148,1162 **** else gid = current->fsgid; al = gfs_alloc_get(dip); error = gfs_quota_lock_m(dip, ! current->fsuid, gid); if (error) goto fail; ! error = gfs_quota_check(dip, current->fsuid, gid); if (error) goto fail_gunlock_q; --- 1148,1172 ---- else gid = current->fsgid; + if ( (sdp->sd_args.ar_suiddir == TRUE) + && (dip->i_di.di_mode & S_ISUID) ) { + if (type == GFS_FILE_DIR) + mode |= S_ISUID; + uid = dip->i_di.di_uid; + gid = dip->i_di.di_gid; + } + else + uid = current->fsuid; + al = gfs_alloc_get(dip); error = gfs_quota_lock_m(dip, ! uid, gid); if (error) goto fail; ! error = gfs_quota_check(dip, uid, gid); if (error) goto fail_gunlock_q; *************** *** 1206,1212 **** if (error) goto fail_end_trans; ! error = make_dinode(dip, gl, inum, type, mode, current->fsuid, gid); if (error) goto fail_end_trans; --- 1216,1222 ---- if (error) goto fail_end_trans; ! error = make_dinode(dip, gl, inum, type, mode, uid, gid); if (error) goto fail_end_trans; *** mount.c.orig 2004-06-24 12:53:28.000000000 +0400 --- mount.c 2004-07-14 13:59:36.000000000 +0400 *************** *** 110,115 **** --- 110,118 ---- else if (!strcmp(x, "upgrade")) args->ar_upgrade = TRUE; + else if (!strcmp(x, "suiddir")) + args->ar_suiddir = TRUE; + else if (!strcmp(x, "num_glockd")) { if (!y) { printk("GFS: need argument to num_glockd\n"); -- e-mail: anton at hq.310.ru http://www.310.ru From john.hearns at clustervision.com Fri Jul 16 14:48:25 2004 From: john.hearns at clustervision.com (John Hearns) Date: Fri, 16 Jul 2004 15:48:25 +0100 Subject: [Linux-cluster] Cluster admin In-Reply-To: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08> References: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08> Message-ID: <1089989304.14987.12.camel@vigor12> On Fri, 2004-07-16 at 14:18, Diego wrote: > Hello everibody, > > I built a linux cluster running the RH 9. > > I'd like to know if is there any tool for the general cluster > administration, such > as add a new user to all the node, put a retrieve fil from the nodes > and so on . For that sort of cluster, you would get better advice on the Beowulf list. There are parallel utilities around, including one which runs parallel terminal sessions (I forget the name for the moment). Our own clustering environment ncludes extensive parallel tools, for parallel command execution, syncing, shutdown, power control. To add users, you can use NIS or LDAP. Having nine machines is a good start, and a good introduction. However, if you don;t have some sort of clustering framework then re-installing, installing more machines and upgrading will be a pain. From zleite at its.caltech.edu Fri Jul 16 16:03:55 2004 From: zleite at its.caltech.edu (Zailo Leite) Date: Fri, 16 Jul 2004 09:03:55 -0700 Subject: [Linux-cluster] LVM2+GNBD? Message-ID: <1089993835.7257.13.camel@DHCP-152-86.caltech.edu> Can I make a LVM2 logical volume using GNBD imported block devices from, say 2 GNDB servers, then make a GFS device out of the LVM volume? I'm building a test rig for trying it, but if someone knows that it won't work, I'd appreciate the head's up... From notiggy at gmail.com Fri Jul 16 18:23:59 2004 From: notiggy at gmail.com (Brian Jackson) Date: Fri, 16 Jul 2004 13:23:59 -0500 Subject: [Linux-cluster] LVM2+GNBD? In-Reply-To: <1089993835.7257.13.camel@DHCP-152-86.caltech.edu> References: <1089993835.7257.13.camel@DHCP-152-86.caltech.edu> Message-ID: On Fri, 16 Jul 2004 09:03:55 -0700, Zailo Leite wrote: > Can I make a LVM2 logical volume using GNBD imported block devices from, > say 2 GNDB servers, then make a GFS device out of the LVM volume? > I'm building a test rig for trying it, but if someone knows that it > won't work, I'd appreciate the head's up... It should work fine. To the system gnbd is just another block device. You are limited to the different raid levels you can use in a shared device situation though. Just something to keep in mind. --Brian Jackson From danderso at redhat.com Fri Jul 16 18:36:45 2004 From: danderso at redhat.com (Derek Anderson) Date: Fri, 16 Jul 2004 13:36:45 -0500 Subject: [Linux-cluster] LVM2+GNBD? In-Reply-To: References: <1089993835.7257.13.camel@DHCP-152-86.caltech.edu> Message-ID: <200407161336.45390.danderso@redhat.com> On Friday 16 July 2004 13:23, Brian Jackson wrote: > On Fri, 16 Jul 2004 09:03:55 -0700, Zailo Leite wrote: > > Can I make a LVM2 logical volume using GNBD imported block devices from, > > say 2 GNDB servers, then make a GFS device out of the LVM volume? > > I'm building a test rig for trying it, but if someone knows that it > > won't work, I'd appreciate the head's up... You need to add the line: types = [ "gnbd", 1 ] to the devices section of the /etc/lvm/lvm.conf for lvm to scan for GNBD devices. > > It should work fine. To the system gnbd is just another block device. > You are limited to the different raid levels you can use in a shared > device situation though. Just something to keep in mind. > > --Brian Jackson > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From lhh at redhat.com Fri Jul 16 19:09:13 2004 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 16 Jul 2004 15:09:13 -0400 Subject: [Linux-cluster] rgmanager pre-commit Message-ID: <1090004953.3699.24.camel@atlantis.boston.redhat.com> Here's the RM I've been working on: http://people.redhat.com/lhh/rgmanager-1.3.0.tar.gz README (ie, we know it's broken; read this first): http://people.redhat.com/lhh/README.rgmanager Example stuff: http://people.redhat.com/lhh/cluster.xml It's *not* stable (= barely runs in some cases), but should give insight as to where we were going with it. I will be OOTO next week, so let the forest fire begin. It will be a few more weeks before this is integrated into the main project, and the name is probably going to change as well as some of the thread nastiness. Please read the README first. Basically, tree-structured, user-configurable resource groups - similar to the way clumanager 1.x handled them, only a lot more flexible - and if you don't like the way they're structured, you can edit the rule sets defining how they're structured to your liking. /me revs up his ZX6 -- Lon From kpreslan at redhat.com Fri Jul 16 22:20:36 2004 From: kpreslan at redhat.com (Ken Preslan) Date: Fri, 16 Jul 2004 17:20:36 -0500 Subject: [Linux-cluster] patch for gfs, add suiddir option for mount In-Reply-To: <61425687.20040716124749@hq.310.ru> References: <61425687.20040716124749@hq.310.ru> Message-ID: <20040716222036.GA31057@potassium.msp.redhat.com> On Fri, Jul 16, 2004 at 12:47:49PM +0400, ????? ????????? wrote: > Hi linux-cluster, > > I apologize for the previous letter, here full patch > > suiddir option for mount > man 8 mount (FreeBSD) > > You're patch didn't actually follow the FreeBSD man page's description of suiddir. From: http://www.freebsd.org/cgi/man.cgi?query=mount&sektion=8&apropos=0&manpath=FreeBSD+5.2-RELEASE+and+Ports A directory on the mounted file system will respond to the SUID bit being set, by setting the owner of any new files to be the same as the owner of the directory. New directories will inherit the bit from their parents. Execute bits are removed from the file, and it will not be given to root. Note the last sentence. The below patch is acceptable to me. Is it ok with you? diff -urN crap1/gfs-kernel/src/gfs/gfs_ioctl.h crap2/gfs-kernel/src/gfs/gfs_ioctl.h --- crap1/gfs-kernel/src/gfs/gfs_ioctl.h 24 Jun 2004 08:53:27 -0000 1.1 +++ crap2/gfs-kernel/src/gfs/gfs_ioctl.h 16 Jul 2004 22:13:05 -0000 @@ -213,6 +213,7 @@ unsigned int ar_num_glockd; int ar_posixacls; /* Enable posix acls */ + int ar_suiddir; /* suiddir support */ }; #endif /* ___GFS_IOCTL_DOT_H__ */ diff -urN crap1/gfs-kernel/src/gfs/inode.c crap2/gfs-kernel/src/gfs/inode.c --- crap1/gfs-kernel/src/gfs/inode.c 16 Jul 2004 22:07:02 -0000 1.3 +++ crap2/gfs-kernel/src/gfs/inode.c 16 Jul 2004 22:13:05 -0000 @@ -1132,16 +1132,26 @@ struct posix_acl *acl = NULL; struct gfs_alloc *al; struct gfs_inode *ip; - unsigned int gid; + unsigned int uid, gid; int alloc_required; int error; + if (sdp->sd_args.ar_suiddir && + (dip->i_di.di_mode & S_ISUID) && + dip->i_di.di_uid) { + if (type == GFS_FILE_DIR) + mode |= S_ISUID; + else if (dip->i_di.di_uid != current->fsuid) + mode &= ~07111; + uid = dip->i_di.di_uid; + } else + uid = current->fsuid; + if (dip->i_di.di_mode & S_ISGID) { if (type == GFS_FILE_DIR) mode |= S_ISGID; gid = dip->i_di.di_gid; - } - else + } else gid = current->fsgid; error = gfs_setup_new_acl(dip, type, &mode, &acl); @@ -1150,13 +1160,11 @@ al = gfs_alloc_get(dip); - error = gfs_quota_lock_m(dip, - current->fsuid, - gid); + error = gfs_quota_lock_m(dip, uid, gid); if (error) goto fail; - error = gfs_quota_check(dip, current->fsuid, gid); + error = gfs_quota_check(dip, uid, gid); if (error) goto fail_gunlock_q; @@ -1206,13 +1214,13 @@ if (error) goto fail_end_trans; - error = make_dinode(dip, gl, inum, type, mode, current->fsuid, gid); + error = make_dinode(dip, gl, inum, type, mode, uid, gid); if (error) goto fail_end_trans; al->al_ul = gfs_trans_add_unlinked(sdp, GFS_LOG_DESC_IDA, &(struct gfs_inum){0, inum->no_addr}); - gfs_trans_add_quota(sdp, +1, current->fsuid, gid); + gfs_trans_add_quota(sdp, +1, uid, gid); /* Gfs_inode_get() can't fail here. But then again, it shouldn't be here (it should be in gfs_createi()). Gfs_init_acl() has no diff -urN crap1/gfs-kernel/src/gfs/mount.c crap2/gfs-kernel/src/gfs/mount.c --- crap1/gfs-kernel/src/gfs/mount.c 24 Jun 2004 08:53:28 -0000 1.1 +++ crap2/gfs-kernel/src/gfs/mount.c 16 Jul 2004 22:13:05 -0000 @@ -128,6 +128,9 @@ else if (!strcmp(x, "acl")) args->ar_posixacls = TRUE; + else if (!strcmp(x, "suiddir")) + args->ar_suiddir = TRUE; + /* Unknown */ else { -- Ken Preslan From kazutomo at powercockpit.net Sat Jul 17 04:09:50 2004 From: kazutomo at powercockpit.net (Kazutomo Yoshii) Date: Fri, 16 Jul 2004 21:09:50 -0700 Subject: [Linux-cluster] Cluster admin In-Reply-To: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08> References: <001001c46b37$6309daa0$77b4e383@Mmepcfluids08> Message-ID: <40F8A68E.5000107@powercockpit.net> Hi, > Hello everibody, > > I built a linux cluster running the RH 9. > > I'd like to know if is there any tool for the general cluster > administration, such > as add a new user to all the node, put a retrieve fil from the nodes > and so on . > NIS or openldap may be good for managing user in cluster. If you want to do arbitrary operation to entirer cluster, you may need cluster-wise shell such as http://sourceforge.net/projects/clusterssh/ I'm also working on similar tool. Thanks, Kaz -- My PowerCockpit page: http://powercockpit.net/hacking/ > I had a look to OSCAR, but I don't want to installl again all the > machine that are > already set-up to go. > > Many Thanks > > Diego > > --- > Outgoing mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.720 / Virus Database: 476 - Release Date: 14/07/2004 > >------------------------------------------------------------------------ > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >http://www.redhat.com/mailman/listinfo/linux-cluster > > From chloong at nextnationnet.com Mon Jul 19 09:09:37 2004 From: chloong at nextnationnet.com (chloong) Date: Mon, 19 Jul 2004 17:09:37 +0800 Subject: [Linux-cluster] unresolved symbol Message-ID: <40FB8FD1.6070902@nextnationnet.com> hi all, I am facing this unresolved symbol error when i do a depmod -a for gfs. i am using kernel 2.4.21-15.0.3.EL. I actually re-compile the GFS using GFS-6.0.0-1.2.src.rpm from HughesJR.com where it is able to compile under 2.4.21-15.0.3.EL version where as the one that i downloaded from RedHat is only be able to compile under 2.4.21-15.EL version. I was able to compile it and installed but when i run depmod -a it complain that there are unresolved symbol for all the GFS modules.... depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.0.3.EL/kernel/drivers/block/gnbd/gnbd.o depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.0.3.EL/kernel/drivers/block/gnbd/gnbd_serv. o depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.0.3.EL/kernel/drivers/md/pool/pool.o depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.0.3.EL/kernel/fs/gfs/gfs.o depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.0.3.EL/kernel/fs/gfs_locking/lock_gulm/lock _gulm.o depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.0.3.EL/kernel/fs/gfs_locking/lock_harness/l ock_harness.o depmod: *** Unresolved symbols in /lib/modules/2.4.21-15.0.3.EL/kernel/fs/gfs_locking/lock_nolock/lo ck_nolock.o I checked that my kernel version is correct using uname -r... How could i go about this....? Please help.... Thanks! From mailing-lists at hughesjr.com Mon Jul 19 10:00:39 2004 From: mailing-lists at hughesjr.com (Johnny Hughes) Date: Mon, 19 Jul 2004 05:00:39 -0500 Subject: [Linux-cluster] unresolved symbol In-Reply-To: <40FB8FD1.6070902@nextnationnet.com> References: <40FB8FD1.6070902@nextnationnet.com> Message-ID: <1090231239.10085.13.camel@Myth.home.local> On Mon, 2004-07-19 at 04:09, chloong wrote: > hi all, > I am facing this unresolved symbol error when i do a depmod -a for gfs. > > i am using kernel 2.4.21-15.0.3.EL. I actually re-compile the GFS using > GFS-6.0.0-1.2.src.rpm from HughesJR.com where it is able to compile > under 2.4.21-15.0.3.EL version where as the one that i downloaded from > RedHat is only be able to compile under 2.4.21-15.EL version. > > I was able to compile it and installed but when i run depmod -a it > complain that there are unresolved symbol for all the GFS modules.... Did you compile the version for your arch (i686, athlon)? If you are running a i686 kernel, you need to compile with a target=i686 or if you have an athlon kernel, you need to compile with target=athlon You can download the binary modules from my site as well and see if you have the same problem. In order to compile the SRPM as written for a target=athlon you will need to install (at least for the compile), kernel-unsupported, kernel-smp, and kernel-source ... for target=i686 you need to install kernel-unsupported, kernel-smp, kernel-source, and kernel-hugemem Johnny Hughes HughesJR.com > I checked that my kernel version is correct using uname -r... > > How could i go about this....? Please help.... From chloong at nextnationnet.com Mon Jul 19 10:21:38 2004 From: chloong at nextnationnet.com (chloong) Date: Mon, 19 Jul 2004 18:21:38 +0800 Subject: [Linux-cluster] unresolved symbol In-Reply-To: <1090231239.10085.13.camel@Myth.home.local> References: <40FB8FD1.6070902@nextnationnet.com> <1090231239.10085.13.camel@Myth.home.local> Message-ID: <40FBA0B2.2000700@nextnationnet.com> Johnny Hughes wrote: >On Mon, 2004-07-19 at 04:09, chloong wrote: > > >>hi all, >>I am facing this unresolved symbol error when i do a depmod -a for gfs. >> >>i am using kernel 2.4.21-15.0.3.EL. I actually re-compile the GFS using >>GFS-6.0.0-1.2.src.rpm from HughesJR.com where it is able to compile >>under 2.4.21-15.0.3.EL version where as the one that i downloaded from >>RedHat is only be able to compile under 2.4.21-15.EL version. >> >>I was able to compile it and installed but when i run depmod -a it >>complain that there are unresolved symbol for all the GFS modules.... >> >> > >Did you compile the version for your arch (i686, athlon)? > >If you are running a i686 kernel, you need to compile with a target=i686 >or if you have an athlon kernel, you need to compile with target=athlon > >You can download the binary modules from my site as well and see if you >have the same problem. > >In order to compile the SRPM as written for a target=athlon you will >need to install (at least for the compile), kernel-unsupported, >kernel-smp, and kernel-source ... for target=i686 you need to install >kernel-unsupported, kernel-smp, kernel-source, and kernel-hugemem > >Johnny Hughes >HughesJR.com > > > >>I checked that my kernel version is correct using uname -r... >> >>How could i go about this....? Please help.... >> >> > > > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >http://www.redhat.com/mailman/listinfo/linux-cluster > > > > hi Johnny, I had tried using your bin, both i386 & i686, but still have the same problem. BTW, how could i know what target should i use? i am running on a x86 platform and i compile the kernel myself and i select the cpu type as 386 family with no smp support. the kernel source i downloaded from rpmfind. The version is kernel-2.4.21-15.0.3.EL.src.rpm. I used back the kernel config file provided from this rpm and changed it to no smp support. Everything are fine after reboot using this kernel. Then i compile the gfs source from your side. The compilation was successful. After installation, when i do a depmod -a, it still gave me unresolved symbol for gfs modules.... Please help! Thanks From mailing-lists at hughesjr.com Mon Jul 19 12:01:40 2004 From: mailing-lists at hughesjr.com (Johnny Hughes) Date: Mon, 19 Jul 2004 07:01:40 -0500 Subject: [Linux-cluster] unresolved symbol In-Reply-To: <40FBA0B2.2000700@nextnationnet.com> References: <40FB8FD1.6070902@nextnationnet.com> <1090231239.10085.13.camel@Myth.home.local> <40FBA0B2.2000700@nextnationnet.com> Message-ID: <1090238500.5864.21.camel@Myth.home.local> On Mon, 2004-07-19 at 05:21, chloong wrote: > hi Johnny, > I had tried using your bin, both i386 & i686, but still have the same > problem. > BTW, how could i know what target should i use? i am running on a x86 > platform and i compile the kernel myself and i select the cpu type as > 386 family with no smp support. > > the kernel source i downloaded from rpmfind. The version is > kernel-2.4.21-15.0.3.EL.src.rpm. I used back the kernel config file > provided from this rpm and changed it to no smp support. > OK, did you build a kernel rpm and install it ... if so, what was the name of the kernel's rpm. Is this on RHEL or a clone like WBEL/CentOS/TaoLinux? > Everything are fine after reboot using this kernel. Then i compile the > gfs source from your side. The compilation was successful. After > installation, when i do a depmod -a, it still gave me unresolved symbol > for gfs modules.... > I installed: GFS-6.0.0-1.2.i686.rpm GFS-devel-6.0.0-1.2.i686.rpm GFS-modules-6.0.0-1.2.i686.rpm perl-Net-Telnet-3.03-2.noarch.rpm Then do a: depmod -a No errors... Johnny Hughes HughesJR.com From merlin at studiobz.it Mon Jul 19 15:40:40 2004 From: merlin at studiobz.it (Christian Zoffoli) Date: Mon, 19 Jul 2004 17:40:40 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted Message-ID: <40FBEB78.8040305@studiobz.it> Hi to all. I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, but I have a big problem when I try to export a device with GNBD: ...I've done all the steps in https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage ...but when I try gnbd_export -v -e export1 -d /dev/sdb1 it fails with: --- receiver: ERROR cannot connect to cluster manager : Operation not permitted gnbd_export: ERROR gnbd_clusterd failed --- looking at the log I have found this message: --- Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot connect to cluster manager : Operation not permitted --- What can I do? Where can I find a little explanation of the cluster.xml file ? Thanks, Christian From jbrassow at redhat.com Mon Jul 19 16:06:10 2004 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Mon, 19 Jul 2004 11:06:10 -0500 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <40FBEB78.8040305@studiobz.it> References: <40FBEB78.8040305@studiobz.it> Message-ID: <8CD7B6F8-D99D-11D8-ACF6-000A957BB1F6@redhat.com> Christian, I see that there are new binaries that come with gnbd now (like gnbd_monitor and gnbd_clusterd). I have not used gnbd since it has changed, and it might be a good guess to say that the documentation is out of date. Ben M is not in right now, but he would be able to answer your question. I'll try to make sure he gets this. brassow On Jul 19, 2004, at 10:40 AM, Christian Zoffoli wrote: > > Hi to all. > I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, > but I have a big problem when I try to export a device with GNBD: > > > ...I've done all the steps in > https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage > > > ...but when I try > > gnbd_export -v -e export1 -d /dev/sdb1 > > it fails with: > > --- > receiver: ERROR cannot connect to cluster manager : Operation not > permitted > gnbd_export: ERROR gnbd_clusterd failed > --- > > looking at the log I have found this message: > --- > Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] > cannot connect to cluster manager : Operation not permitted > --- > > > What can I do? > Where can I find a little explanation of the cluster.xml file ? > > > Thanks, > Christian > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > From merlin at studiobz.it Mon Jul 19 18:13:38 2004 From: merlin at studiobz.it (Christian Zoffoli) Date: Mon, 19 Jul 2004 20:13:38 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <8CD7B6F8-D99D-11D8-ACF6-000A957BB1F6@redhat.com> References: <40FBEB78.8040305@studiobz.it> <8CD7B6F8-D99D-11D8-ACF6-000A957BB1F6@redhat.com> Message-ID: <40FC0F52.6020508@studiobz.it> Jonathan E Brassow wrote: [cut] > > Ben M is not in right now, but he would be able to answer your > question. I'll try to make sure he gets this. thanks, I'm very interested to make extensive tests on the new code. Christian From amir at datacore.ch Mon Jul 19 18:45:06 2004 From: amir at datacore.ch (Amir Guindehi) Date: Mon, 19 Jul 2004 20:45:06 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <40FBEB78.8040305@studiobz.it> References: <40FBEB78.8040305@studiobz.it> Message-ID: <40FC16B2.5070703@datacore.ch> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Christian, | Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot | connect to cluster manager : Operation not permitted Did you start gnbd_serv? If so, check the permissions of /dev/gnbd_ctl. Eventually the are wrong (they where here). If you run the 'gnbd_export' with 'strace' you will see more. I remember that using devfs one needs something along the following lines in /etc/devfs.d/gnbd: # # GNBD # gnbd needs crw------- on /dev/gnbd_ctl # REGISTER ^gnbd_ctl PERMISSIONS root.root 600 Regards - - Amir - -- Amir Guindehi, nospam.amir at datacore.ch DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-nr1 (Windows 2000) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFA/BaxbycOjskSVCwRAp+WAJwImq2LK4NvQJirXpztKLRu+d4+8ACeO8ie G4XXlqrtTMT5Wi/116uoE0M= =fNie -----END PGP SIGNATURE----- From merlin at studiobz.it Mon Jul 19 19:26:55 2004 From: merlin at studiobz.it (Christian Zoffoli) Date: Mon, 19 Jul 2004 21:26:55 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <40FC16B2.5070703@datacore.ch> References: <40FBEB78.8040305@studiobz.it> <40FC16B2.5070703@datacore.ch> Message-ID: <40FC207F.9040301@studiobz.it> Amir Guindehi wrote: [cut] > Did you start gnbd_serv? yes > If so, check the permissions of /dev/gnbd_ctl. Eventually the are wrong > (they where here). If you run the 'gnbd_export' with 'strace' you will > see more. permissions seems correct here crw------- thanks for the infos. Christian From danderso at redhat.com Mon Jul 19 19:46:43 2004 From: danderso at redhat.com (Derek Anderson) Date: Mon, 19 Jul 2004 14:46:43 -0500 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <40FBEB78.8040305@studiobz.it> References: <40FBEB78.8040305@studiobz.it> Message-ID: <200407191446.43030.danderso@redhat.com> Christian, You need to execute the cluster setup steps on the page you linked below (previous to the GNBD-specific sections). Specifically, on each node you need to run: modprobe lock_dlm, ccsd, cman_tool join, and fence_tool join after you have cman quorum. Then you should be able to gnbd_export devices. On Monday 19 July 2004 10:40, Christian Zoffoli wrote: > Hi to all. > I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, > but I have a big problem when I try to export a device with GNBD: > > > ...I've done all the steps in > https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage > > > ...but when I try > > gnbd_export -v -e export1 -d /dev/sdb1 > > it fails with: > > --- > receiver: ERROR cannot connect to cluster manager : Operation not permitted > gnbd_export: ERROR gnbd_clusterd failed > --- > > looking at the log I have found this message: > --- > Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot > connect to cluster manager : Operation not permitted > --- > > > What can I do? > Where can I find a little explanation of the cluster.xml file ? > > > Thanks, > Christian > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From phillips at redhat.com Mon Jul 19 20:13:37 2004 From: phillips at redhat.com (Daniel Phillips) Date: Mon, 19 Jul 2004 16:13:37 -0400 Subject: [Linux-cluster] GFS limits? In-Reply-To: References: <40F460BB.4040603@smugmug.com> Message-ID: <200407191613.38768.phillips@redhat.com> On Tuesday 13 July 2004 18:50, Brian Jackson wrote: > > Does GFS intelligently "spread > > the load" among multiple storage entities for writing under high > > load? > > No, each node that mounts has direct access to the storage. It writes > just like any other fs, when it can. Hi Brian, He can do that at the block device level, with a device-mapper "striped" target. Regards, Daniel From merlin at studiobz.it Mon Jul 19 20:35:50 2004 From: merlin at studiobz.it (Christian Zoffoli) Date: Mon, 19 Jul 2004 22:35:50 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <200407191446.43030.danderso@redhat.com> References: <40FBEB78.8040305@studiobz.it> <200407191446.43030.danderso@redhat.com> Message-ID: <40FC30A6.4060808@studiobz.it> Derek Anderson wrote: > Christian, > > You need to execute the cluster setup steps on the page you linked below > (previous to the GNBD-specific sections). Specifically, on each node you > need to run: modprobe lock_dlm, ccsd, cman_tool join, and fence_tool join > after you have cman quorum. Then you should be able to gnbd_export devices. I have done all the steps but I found these errors in the logs: node1 (GFS1) ----- Jul 20 01:29:02 gfs1 Lock_Harness (built Jul 17 2004 22:54:18) installed Jul 20 01:29:02 gfs1 CMAN (built Jul 17 2004 22:59:43) installed Jul 20 01:29:02 gfs1 NET: Registered protocol family 31 Jul 20 01:29:02 gfs1 DLM (built Jul 18 2004 00:08:18) installed Jul 20 01:29:02 gfs1 Lock_DLM (built Jul 17 2004 22:53:49) installed Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data available Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data available Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data available Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data available Jul 20 01:29:03 gfs1 CMAN: Waiting to join or form a Linux-cluster Jul 20 01:29:14 gfs1 CMAN: forming a new cluster Jul 20 01:29:14 gfs1 CMAN: quorum regained, resuming activity Jul 20 01:29:34 gfs1 CMAN: got node gfs2 Jul 20 01:30:06 gfs1 gnbd: registered device at major 253 Jul 20 01:30:38 gfs1 gnbd_serv[8039]: startup succeeded Jul 20 01:30:38 gfs1 receiver[8043]: ERROR [gnbd_clusterd.c:53] cannot connect to cluster manager : Operation not permitted ----- node2 (GFS2) ----- Jul 20 01:15:24 gfs2 Lock_Harness (built Jul 17 2004 22:54:18) installed Jul 20 01:15:24 gfs2 CMAN (built Jul 17 2004 22:59:43) installed Jul 20 01:15:24 gfs2 NET: Registered protocol family 31 Jul 20 01:15:24 gfs2 DLM (built Jul 18 2004 00:07:29) installed Jul 20 01:15:24 gfs2 Lock_DLM (built Jul 17 2004 22:53:49) installed Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data available Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data available Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data available Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data available Jul 20 01:15:51 gfs2 CMAN: Waiting to join or form a Linux-cluster Jul 20 01:15:55 gfs2 CMAN: sending membership request Jul 20 01:15:55 gfs2 CMAN: got node gfs1 Jul 20 01:15:55 gfs2 CMAN: quorum regained, resuming activity Jul 20 01:16:28 gfs2 gnbd: registered device at major 253 Jul 20 01:16:52 gfs2 gnbd_serv[8025]: startup succeeded Jul 20 01:17:03 gfs2 receiver[8029]: ERROR [gnbd_clusterd.c:53] cannot connect to cluster manager : Operation not permitted ----- here is the cluster.xml file: ----- ----- Christian From danderso at redhat.com Mon Jul 19 21:25:52 2004 From: danderso at redhat.com (Derek Anderson) Date: Mon, 19 Jul 2004 16:25:52 -0500 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <40FC30A6.4060808@studiobz.it> References: <40FBEB78.8040305@studiobz.it> <200407191446.43030.danderso@redhat.com> <40FC30A6.4060808@studiobz.it> Message-ID: <200407191625.52391.danderso@redhat.com> Christian, I tried it again with your config file and it is working for me. What do the /proc/cluster/nodes, /proc/cluster/services, and /proc/cluster/status files look like on the nodes? On Monday 19 July 2004 15:35, Christian Zoffoli wrote: > Derek Anderson wrote: > > Christian, > > > > You need to execute the cluster setup steps on the page you linked below > > (previous to the GNBD-specific sections). Specifically, on each node you > > need to run: modprobe lock_dlm, ccsd, cman_tool join, and fence_tool join > > after you have cman quorum. Then you should be able to gnbd_export > > devices. > > I have done all the steps but I found these errors in the logs: > > node1 (GFS1) > ----- > Jul 20 01:29:02 gfs1 Lock_Harness (built Jul 17 2004 22:54:18) > installed > Jul 20 01:29:02 gfs1 CMAN (built Jul 17 2004 22:59:43) installed > Jul 20 01:29:02 gfs1 NET: Registered protocol family 31 > Jul 20 01:29:02 gfs1 DLM (built Jul 18 2004 00:08:18) installed > Jul 20 01:29:02 gfs1 Lock_DLM (built Jul 17 2004 22:53:49) installed > Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data > available > Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data > available > Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data > available > Jul 20 01:29:02 gfs1 ccsd[6723]: Error while processing get: No data > available > Jul 20 01:29:03 gfs1 CMAN: Waiting to join or form a Linux-cluster > Jul 20 01:29:14 gfs1 CMAN: forming a new cluster > Jul 20 01:29:14 gfs1 CMAN: quorum regained, resuming activity > Jul 20 01:29:34 gfs1 CMAN: got node gfs2 > Jul 20 01:30:06 gfs1 gnbd: registered device at major 253 > Jul 20 01:30:38 gfs1 gnbd_serv[8039]: startup succeeded > Jul 20 01:30:38 gfs1 receiver[8043]: ERROR [gnbd_clusterd.c:53] cannot > connect to cluster manager : Operation not permitted > ----- > > > node2 (GFS2) > ----- > Jul 20 01:15:24 gfs2 Lock_Harness (built Jul 17 2004 22:54:18) > installed > Jul 20 01:15:24 gfs2 CMAN (built Jul 17 2004 22:59:43) installed > Jul 20 01:15:24 gfs2 NET: Registered protocol family 31 > Jul 20 01:15:24 gfs2 DLM (built Jul 18 2004 00:07:29) installed > Jul 20 01:15:24 gfs2 Lock_DLM (built Jul 17 2004 22:53:49) installed > Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data > available > Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data > available > Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data > available > Jul 20 01:15:51 gfs2 ccsd[6724]: Error while processing get: No data > available > Jul 20 01:15:51 gfs2 CMAN: Waiting to join or form a Linux-cluster > Jul 20 01:15:55 gfs2 CMAN: sending membership request > Jul 20 01:15:55 gfs2 CMAN: got node gfs1 > Jul 20 01:15:55 gfs2 CMAN: quorum regained, resuming activity > Jul 20 01:16:28 gfs2 gnbd: registered device at major 253 > Jul 20 01:16:52 gfs2 gnbd_serv[8025]: startup succeeded > Jul 20 01:17:03 gfs2 receiver[8029]: ERROR [gnbd_clusterd.c:53] cannot > connect to cluster manager : Operation not permitted > ----- > > > here is the cluster.xml file: > ----- > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- > > > Christian From merlin at studiobz.it Mon Jul 19 21:27:58 2004 From: merlin at studiobz.it (Christian Zoffoli) Date: Mon, 19 Jul 2004 23:27:58 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <200407191625.52391.danderso@redhat.com> References: <40FBEB78.8040305@studiobz.it> <200407191446.43030.danderso@redhat.com> <40FC30A6.4060808@studiobz.it> <200407191625.52391.danderso@redhat.com> Message-ID: <40FC3CDE.6050104@studiobz.it> Derek Anderson wrote: > Christian, > > I tried it again with your config file and it is working for me. What do the > /proc/cluster/nodes, /proc/cluster/services, and /proc/cluster/status files > look like on the nodes? ----- gfs1 root # cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M gfs1 2 1 1 M gfs2 ----- ----- gfs2 root # cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M gfs1 2 1 1 M gfs2 ----- ----- gfs1 root # cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [1 2] ----- ----- gfs2 root # cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [1 2] ----- ----- gfs1 root # cat /proc/cluster/status Version: 2.0.1 Config version: 1 Cluster name: xcluster Cluster ID: 28724 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 1 Node addresses: 10.0.4.101 ----- ----- gfs2 root # cat /proc/cluster/status Version: 2.0.1 Config version: 1 Cluster name: xcluster Cluster ID: 28724 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 1 Node addresses: 10.0.4.10 ----- From chloong at nextnationnet.com Tue Jul 20 02:11:59 2004 From: chloong at nextnationnet.com (chloong) Date: Tue, 20 Jul 2004 10:11:59 +0800 Subject: [Linux-cluster] unresolved symbol solved! In-Reply-To: <1090238500.5864.21.camel@Myth.home.local> References: <40FB8FD1.6070902@nextnationnet.com> <1090231239.10085.13.camel@Myth.home.local> <40FBA0B2.2000700@nextnationnet.com> <1090238500.5864.21.camel@Myth.home.local> Message-ID: <40FC7F6F.3090300@nextnationnet.com> Jonny, Thanks a lot man! I managed to install GFS and run it. Actually i used back the kernel-2.4.21-15.EL for smp. As follow what you said, installed all kernel-source, kernel-hugemem, kernel-unsupported, kernel-smp and then recompile from src. Now no more unresolved symbol and able to modprobe all the modules. Need to configure GFS now. Thanks again man! Johnny Hughes wrote: >On Mon, 2004-07-19 at 05:21, chloong wrote: > > >>hi Johnny, >>I had tried using your bin, both i386 & i686, but still have the same >>problem. >>BTW, how could i know what target should i use? i am running on a x86 >>platform and i compile the kernel myself and i select the cpu type as >>386 family with no smp support. >> >>the kernel source i downloaded from rpmfind. The version is >>kernel-2.4.21-15.0.3.EL.src.rpm. I used back the kernel config file >>provided from this rpm and changed it to no smp support. >> >> >> >OK, did you build a kernel rpm and install it ... if so, what was the >name of the kernel's rpm. > >Is this on RHEL or a clone like WBEL/CentOS/TaoLinux? > > > >>Everything are fine after reboot using this kernel. Then i compile the >>gfs source from your side. The compilation was successful. After >>installation, when i do a depmod -a, it still gave me unresolved symbol >>for gfs modules.... >> >> >> >I installed: >GFS-6.0.0-1.2.i686.rpm >GFS-devel-6.0.0-1.2.i686.rpm >GFS-modules-6.0.0-1.2.i686.rpm >perl-Net-Telnet-3.03-2.noarch.rpm > >Then do a: > >depmod -a > >No errors... > >Johnny Hughes >HughesJR.com > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From teigland at redhat.com Tue Jul 20 04:14:42 2004 From: teigland at redhat.com (David Teigland) Date: Tue, 20 Jul 2004 12:14:42 +0800 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <40FBEB78.8040305@studiobz.it> References: <40FBEB78.8040305@studiobz.it> Message-ID: <20040720041442.GA11189@redhat.com> On Mon, Jul 19, 2004 at 05:40:40PM +0200, Christian Zoffoli wrote: > > Hi to all. > I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, > but I have a big problem when I try to export a device with GNBD: > > > ...I've done all the steps in > https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage > > > ...but when I try > > gnbd_export -v -e export1 -d /dev/sdb1 > > it fails with: > > --- > receiver: ERROR cannot connect to cluster manager : Operation not permitted > gnbd_export: ERROR gnbd_clusterd failed > --- > > looking at the log I have found this message: > --- > Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot > connect to cluster manager : Operation not permitted There have been a lot of people who have had this same problem. In general there is no reason for gnbd to have any relation to clustering. This begs the question, why is there a "gnbd_cluster" thread trying to talk with a cluster manager? IMO it's unfortunate that gnbd is doing this at all, much more so by default, causing unnecessary problems for so many people. AFAIK, the only way to prevent gnbd from doing this is to use the "-c Enable caching" flag for gnbd_export. Try using that flag and see if it helps. Now a feeble attempt to answer the question above. When you do a "non-caching" export (don't use -c), gnbd assumes that it also needs to talk with a cluster manager because it assumes that you are going to use two gnbd servers to export the same (shared) underlying block device. This also assumes that the clients are using some form of multi-pathing in their volume manager. I may be wrong on some of that, but it's clearly not the way most people use gnbd -- people usually have SAN's precisely to avoid using gnbd, not to do gnbd multi-pathing. [If anything, people want to do mirroring between gnbd servers, not fail-over. Fail-over may be useful for some people, but I'd hope it could be done without making gnbd itself impossibly convoluted.] -- Dave Teigland From chloong at nextnationnet.com Tue Jul 20 12:13:15 2004 From: chloong at nextnationnet.com (chloong) Date: Tue, 20 Jul 2004 20:13:15 +0800 Subject: [Linux-cluster] unable to mount gfs partition Message-ID: <40FD0C5B.9060601@nextnationnet.com> hi all, I managed to setup the whole gfs clustering. i have 2 nodes servers in this gfs cluster. 1 node is mounting the gfs partition without any issue but the other one not able to mount...giving me error: #mount -t gfs /dev/pool/smsgateclu_pool0 /gfs1 mount: wrong fs type, bad option, bad superblock on /dev/pool/smsgateclu_pool0, or too many mounted file systems can anyone facing this problem? Please help! Thanks From rmayhew at mweb.com Tue Jul 20 15:02:55 2004 From: rmayhew at mweb.com (Richard Mayhew) Date: Tue, 20 Jul 2004 17:02:55 +0200 Subject: [Linux-cluster] unable to mount gfs partition Message-ID: <91C4F1A7C418014D9F88E938C135545860A1A8@mwjdc2.mweb.com> Hi, Are all your Daemons running and functioning correctly (specially the lock_gulm daemon) Have you assembled your pool device? -- Regards Richard Mayhew Unix Specialist MWEB Business Tel: + 27 11 340 7200 Fax: + 27 11 340 7288 Website: www.mwebbusiness.co.za -----Original Message----- From: chloong [mailto:chloong at nextnationnet.com] Sent: 20 July 2004 02:13 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] unable to mount gfs partition hi all, I managed to setup the whole gfs clustering. i have 2 nodes servers in this gfs cluster. 1 node is mounting the gfs partition without any issue but the other one not able to mount...giving me error: #mount -t gfs /dev/pool/smsgateclu_pool0 /gfs1 mount: wrong fs type, bad option, bad superblock on /dev/pool/smsgateclu_pool0, or too many mounted file systems can anyone facing this problem? Please help! Thanks -- Linux-cluster mailing list Linux-cluster at redhat.com http://www.redhat.com/mailman/listinfo/linux-cluster From notiggy at gmail.com Tue Jul 20 15:44:25 2004 From: notiggy at gmail.com (Brian Jackson) Date: Tue, 20 Jul 2004 10:44:25 -0500 Subject: [Linux-cluster] GFS limits? In-Reply-To: <200407191613.38768.phillips@redhat.com> References: <40F460BB.4040603@smugmug.com> <200407191613.38768.phillips@redhat.com> Message-ID: On Mon, 19 Jul 2004 16:13:37 -0400, Daniel Phillips wrote: > On Tuesday 13 July 2004 18:50, Brian Jackson wrote: > > > Does GFS intelligently "spread > > > the load" among multiple storage entities for writing under high > > > load? > > > > No, each node that mounts has direct access to the storage. It writes > > just like any other fs, when it can. > > Hi Brian, > > He can do that at the block device level, with a device-mapper "striped" > target. True but that's not very intelligent. I thought he meant some kind of hot spot tracking or something similar. --Brian > > Regards, > > Daniel > From amanthei at redhat.com Tue Jul 20 15:57:19 2004 From: amanthei at redhat.com (Adam Manthei) Date: Tue, 20 Jul 2004 10:57:19 -0500 Subject: [Linux-cluster] unable to mount gfs partition In-Reply-To: <40FD0C5B.9060601@nextnationnet.com> References: <40FD0C5B.9060601@nextnationnet.com> Message-ID: <20040720155719.GD3866@redhat.com> On Tue, Jul 20, 2004 at 08:13:15PM +0800, chloong wrote: > hi all, > I managed to setup the whole gfs clustering. i have 2 nodes servers in > this gfs cluster. > > 1 node is mounting the gfs partition without any issue but the other one > not able to mount...giving me error: > #mount -t gfs /dev/pool/smsgateclu_pool0 /gfs1 > mount: wrong fs type, bad option, bad superblock on > /dev/pool/smsgateclu_pool0, > or too many mounted file systems > > can anyone facing this problem? This is the standard error message that mount gives on error. In general it isn't very usefull. More accurate error messages are on the console. Post your `dmesg` output if you are still having problems. -- Adam Manthei From bmarzins at redhat.com Tue Jul 20 17:12:35 2004 From: bmarzins at redhat.com (Benjamin Marzinski) Date: Tue, 20 Jul 2004 12:12:35 -0500 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <40FBEB78.8040305@studiobz.it> References: <40FBEB78.8040305@studiobz.it> Message-ID: <20040720171235.GG23619@phlogiston.msp.redhat.com> On Mon, Jul 19, 2004 at 05:40:40PM +0200, Christian Zoffoli wrote: > > Hi to all. > I have compiled and installed all the stuff in cvs on a vanilla 2.6.7, > but I have a big problem when I try to export a device with GNBD: > > > ...I've done all the steps in > https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS.GNBD.Usage > > > ...but when I try > > gnbd_export -v -e export1 -d /dev/sdb1 > > it fails with: > > --- > receiver: ERROR cannot connect to cluster manager : Operation not permitted > gnbd_export: ERROR gnbd_clusterd failed > --- > > looking at the log I have found this message: > --- > Jul 19 20:28:13 gfs1 receiver[22551]: ERROR [gnbd_clusterd.c:53] cannot > connect to cluster manager : Operation not permitted > --- If you do not want to enable multipathing or run GFS on the gnbd server, you can just add a -c to your export line. Here's a guess at what you might be seeing. The behaviour that you are seeing looks like it could be caused by not having the correct magma plugins. In the cluster/magma/tests directory there is a cluster plugin test program, cpt, run # cpt null if you get something like Connect failure: Operation not permitted then either cman isn't running on the node, or magma cannot connect to it. If cman is running correctly (check you logs) Then look in /usr/lib/magma/plugins. You should have a sm.so file there. If not, you need to install the magma plugins, which are located in /cluster/magma-plugins. -Ben > What can I do? > Where can I find a little explanation of the cluster.xml file ? > > > Thanks, > Christian > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From chloong at nextnationnet.com Wed Jul 21 01:54:36 2004 From: chloong at nextnationnet.com (chloong) Date: Wed, 21 Jul 2004 09:54:36 +0800 Subject: [Linux-cluster] unable to mount gfs partition In-Reply-To: <20040720155719.GD3866@redhat.com> References: <40FD0C5B.9060601@nextnationnet.com> <20040720155719.GD3866@redhat.com> Message-ID: <40FDCCDC.5040402@nextnationnet.com> Adam Manthei wrote: >On Tue, Jul 20, 2004 at 08:13:15PM +0800, chloong wrote: > > >>hi all, >>I managed to setup the whole gfs clustering. i have 2 nodes servers in >>this gfs cluster. >> >>1 node is mounting the gfs partition without any issue but the other one >>not able to mount...giving me error: >>#mount -t gfs /dev/pool/smsgateclu_pool0 /gfs1 >>mount: wrong fs type, bad option, bad superblock on >>/dev/pool/smsgateclu_pool0, >> or too many mounted file systems >> >>can anyone facing this problem? >> >> > >This is the standard error message that mount gives on error. In general it >isn't very usefull. More accurate error messages are on the console. Post >your `dmesg` output if you are still having problems. > > > hi, i checked the dmesg, the error is : lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111 GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata = where as in /var/log/messages the error is : lock_gulm: ERROR Got a -111 trying to login to lock_gulmd. Is it runni ng? lock_gulm: ERROR cm_login failed. -111 lock_gulm: ERROR Got a -111 trying to start the threads. lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111 GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata = i got 2 nodes in the gfs cluster. 1 is the lock_gulm server and the other one is not. the one that not a lock_gulm server giving me mount error... Did i need to start the lock_gulm daemon on this server that is not the lock_gulm server? When i start the lock_gulmd on this server it gave me this error in /var/log/messages: lock_gulmd[18399]: You are running in Standard mode. lock_gulmd[18399]: I am (clu2.abc.com) with ip (192.168.11.212) lock_gulmd[18399]: Forked core [18400]. lock_gulmd_core[18400]: ERROR [core_io.c:1029] Got error from reply: (clu1:192. 168.11.211) 1006:Not Allowed my cluster.ccs : cluster { name = "smsgateclu" lock_gulm { servers = ["clu1"] heartbeat_rate = 0.3 allowed_misses = 1 } } nodes.ccs: nodes { clu1 { ip_interfaces { eth2 = "192.168.11.211" } fence { human { admin { ipaddr = "192.168.11.211" } } } } clu2 { ip_interfaces { eth2 = "192.168.11.212" } fence { human { admin { ipaddr = "192.168.11.212" } } } } } fence.ccs: fence_devices { admin { agent = "fence_manual" } } Please help! Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mtilstra at redhat.com Wed Jul 21 15:02:39 2004 From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra) Date: Wed, 21 Jul 2004 10:02:39 -0500 Subject: [Linux-cluster] unable to mount gfs partition In-Reply-To: <40FDCCDC.5040402@nextnationnet.com> References: <40FD0C5B.9060601@nextnationnet.com> <20040720155719.GD3866@redhat.com> <40FDCCDC.5040402@nextnationnet.com> Message-ID: <20040721150239.GA20220@redhat.com> On Wed, Jul 21, 2004 at 09:54:36AM +0800, chloong wrote: > hi, > i checked the dmesg, the error is : > lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111 > GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata = > where as in /var/log/messages the error is : > lock_gulm: ERROR Got a -111 trying to login to lock_gulmd. Is it > runni > ng? > lock_gulm: ERROR cm_login failed. -111 > lock_gulm: ERROR Got a -111 trying to start the threads. > lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111 > GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata = > i got 2 nodes in the gfs cluster. 1 is the lock_gulm server and the > other one is not. > the one that not a lock_gulm server giving me mount error... > Did i need to start the lock_gulm daemon on this server that is not > the lock_gulm server? yes, you need to start lock_gulmd on every node. > When i start the lock_gulmd on this server it gave me this error in > /var/log/messages: > lock_gulmd[18399]: You are running in Standard mode. > lock_gulmd[18399]: I am (clu2.abc.com) with ip (192.168.11.212) > lock_gulmd[18399]: Forked core [18400]. > lock_gulmd_core[18400]: ERROR [core_io.c:1029] Got error from reply: > (clu1:192. > 168.11.211) 1006:Not Allowed it might be marked expired. do a 'gulm_tool nodelist clu1' that will list what state gulm thinks each node is in. If it is marked expired, and given your ccs config, you'll need to complete the fence manual action. (erm, i forget how that's done, man pages should tell.) -- Michael Conrad Tadpol Tilstra The Grand Illusion: "I am in control!" -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From danderso at redhat.com Wed Jul 21 15:14:49 2004 From: danderso at redhat.com (Derek Anderson) Date: Wed, 21 Jul 2004 10:14:49 -0500 Subject: [Linux-cluster] unable to mount gfs partition In-Reply-To: <40FDCCDC.5040402@nextnationnet.com> References: <40FD0C5B.9060601@nextnationnet.com> <20040720155719.GD3866@redhat.com> <40FDCCDC.5040402@nextnationnet.com> Message-ID: <200407211014.49096.danderso@redhat.com> > hi, > i checked the dmesg, the error is : > > lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111 > GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata = > > where as in /var/log/messages the error is : > > lock_gulm: ERROR Got a -111 trying to login to lock_gulmd. Is it runni > ng? > lock_gulm: ERROR cm_login failed. -111 > lock_gulm: ERROR Got a -111 trying to start the threads. > lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111 > GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata = > > i got 2 nodes in the gfs cluster. 1 is the lock_gulm server and the > other one is not. > the one that not a lock_gulm server giving me mount error... > > Did i need to start the lock_gulm daemon on this server that is not the > lock_gulm server? > > When i start the lock_gulmd on this server it gave me this error in > /var/log/messages: > > lock_gulmd[18399]: You are running in Standard mode. > lock_gulmd[18399]: I am (clu2.abc.com) with ip (192.168.11.212) > lock_gulmd[18399]: Forked core [18400]. > lock_gulmd_core[18400]: ERROR [core_io.c:1029] Got error from reply: > (clu1:192. > 168.11.211) 1006:Not Allowed > > my cluster.ccs : > > cluster { > name = "smsgateclu" > lock_gulm { > servers = ["clu1"] > heartbeat_rate = 0.3 > allowed_misses = 1 > } > } Like tadpol said in the last post, you are most likely expired. Where are people getting these ridiculously low examples of heartbeat_rate and allowed_misses? No wonder you're fenced. > > nodes.ccs: > > nodes { > clu1 { > ip_interfaces { > eth2 = "192.168.11.211" > } > fence { > human { > admin { > ipaddr = "192.168.11.211" > } > } > } > } > clu2 { > ip_interfaces { > eth2 = "192.168.11.212" > } > fence { > human { > admin { > ipaddr = "192.168.11.212" > } > } > } > } > } > > fence.ccs: > > fence_devices { > admin { > agent = "fence_manual" > } > } > > Please help! > > Thanks. From danderso at redhat.com Wed Jul 21 15:21:48 2004 From: danderso at redhat.com (Derek Anderson) Date: Wed, 21 Jul 2004 10:21:48 -0500 Subject: [Linux-cluster] unable to mount gfs partition In-Reply-To: <200407211014.49096.danderso@redhat.com> References: <40FD0C5B.9060601@nextnationnet.com> <40FDCCDC.5040402@nextnationnet.com> <200407211014.49096.danderso@redhat.com> Message-ID: <200407211021.48671.danderso@redhat.com> On Wednesday 21 July 2004 10:14, Derek Anderson wrote: > > hi, > > i checked the dmesg, the error is : > > > > lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111 > > GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata = > > > > where as in /var/log/messages the error is : > > > > lock_gulm: ERROR Got a -111 trying to login to lock_gulmd. Is it runni > > ng? > > lock_gulm: ERROR cm_login failed. -111 > > lock_gulm: ERROR Got a -111 trying to start the threads. > > lock_gulm: fsid=cluster1:gfs1: Exiting gulm_mount with errors -111 > > GFS: can't mount proto = lock_gulm, table = cluster1:gfs1, hostdata = > > > > i got 2 nodes in the gfs cluster. 1 is the lock_gulm server and the > > other one is not. > > the one that not a lock_gulm server giving me mount error... > > > > Did i need to start the lock_gulm daemon on this server that is not the > > lock_gulm server? > > > > When i start the lock_gulmd on this server it gave me this error in > > /var/log/messages: > > > > lock_gulmd[18399]: You are running in Standard mode. > > lock_gulmd[18399]: I am (clu2.abc.com) with ip (192.168.11.212) > > lock_gulmd[18399]: Forked core [18400]. > > lock_gulmd_core[18400]: ERROR [core_io.c:1029] Got error from reply: > > (clu1:192. > > 168.11.211) 1006:Not Allowed > > > > my cluster.ccs : > > > > cluster { > > name = "smsgateclu" > > lock_gulm { > > servers = ["clu1"] > > heartbeat_rate = 0.3 > > allowed_misses = 1 > > } > > } > > Like tadpol said in the last post, you are most likely expired. Where are > people getting these ridiculously low examples of heartbeat_rate and > allowed_misses? No wonder you're fenced. Doh! Right out of the GFS 6.0 manual, huh? I think we should change that example (Table 6.1). Anyway, you should try something closer to the defaults of heartbeat_rate=15, allowed_misses=2 to keep your nodes from getting unnecessarily fenced. Depending on network traffic load you can move it down. It's one of those things you kind of have to play with. > > > nodes.ccs: > > > > nodes { > > clu1 { > > ip_interfaces { > > eth2 = "192.168.11.211" > > } > > fence { > > human { > > admin { > > ipaddr = "192.168.11.211" > > } > > } > > } > > } > > clu2 { > > ip_interfaces { > > eth2 = "192.168.11.212" > > } > > fence { > > human { > > admin { > > ipaddr = "192.168.11.212" > > } > > } > > } > > } > > } > > > > fence.ccs: > > > > fence_devices { > > admin { > > agent = "fence_manual" > > } > > } > > > > Please help! > > > > Thanks. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From adam.cassar at netregistry.com.au Wed Jul 21 22:30:37 2004 From: adam.cassar at netregistry.com.au (Adam Cassar) Date: Thu, 22 Jul 2004 08:30:37 +1000 Subject: [Linux-cluster] Quotas Message-ID: <1090449037.22972.157.camel@akira.nro.au.com> Hi Guys, I've got GFS running in a two node set up on kernel 2.6.7 on debian stable. I can mount both partitions and normal file access seems fine. However any quota related commands just hang: ie ./gfs_quota init -f /mnt just sits there and is unkillable. Below are some interesting lines from ps: 5 0 621 1 16 0 4312 1080 - Ss ? 0:00 ./ccsd 5 0 622 621 16 0 4312 1080 - S ? 0:00 ./ccsd 5 0 623 622 16 0 4312 1080 414395 S ? 0:03 ./ccsd 1 0 625 1 9 -6 0 0 cluste S< ? 0:00 [cman_comms] 5 0 626 1 10 -5 0 0 member S< ? 0:00 [cman_memb] 1 0 627 1 15 0 0 0 servic S ? 0:00 [cman_serviced] 1 0 628 1 9 -6 0 0 hello_ S< ? 0:00 [cman_hbeat] 5 0 631 1 18 0 1344 484 pause Ss ? 0:00 fenced 1 0 632 1 19 0 0 0 kcl_jo D ? 0:00 [cman_userjoin] 1 0 641 1 15 0 0 0 dlm_re S ? 0:00 [dlm_recoverd] 1 0 642 1 15 0 0 0 dlm_as S ? 0:30 [dlm_astd] 1 0 643 1 15 0 0 0 dlm_re S ? 0:13 [dlm_recvd] 1 0 644 1 15 0 0 0 dlm_se S ? 0:10 [dlm_sendd] 1 0 645 1 15 0 0 0 dlm_as S ? 0:18 [lock_dlm] 1 0 646 1 15 0 0 0 dlm_as S ? 0:20 [lock_dlm] 1 0 647 1 15 0 0 0 - S ? 0:06 [gfs_scand] 1 0 648 1 15 0 0 0 gfs_gl S ? 0:05 [gfs_glockd] 1 0 649 1 15 0 0 0 - S ? 0:00 [gfs_recoverd] 1 0 650 1 15 0 0 0 - S ? 0:00 [gfs_logd] 1 0 651 1 15 0 0 0 - S ? 0:00 [gfs_quotad] 1 0 652 1 15 0 0 0 - S ? 0:00 [gfs_inoded] 5 0 1080 286 17 0 5704 1692 - S ? 0:00 /usr/sbin/sshd 5 1002 1082 1080 16 0 5716 1784 - S ? 0:00 /usr/sbin/sshd 0 1002 1083 1082 17 0 2208 1212 wait4 Ss pts/1 0:00 -bash 4 0 1087 1083 15 0 2288 1308 wait4 S pts/1 0:00 -su 4 0 1070 609 18 0 1284 400 - R+ pts/0 0:28 ./gfs_quota init -f /mnt From adam.cassar at netregistry.com.au Thu Jul 22 01:12:22 2004 From: adam.cassar at netregistry.com.au (Adam Cassar) Date: Thu, 22 Jul 2004 11:12:22 +1000 Subject: [Linux-cluster] gnbd crash Message-ID: <1090458742.22972.192.camel@akira.nro.au.com> Guys, I was attempting to use gnbd. I exported the device on the server and attempted to mount it on the client. mount /dev/gnbd/export1 /mnt on the client hangs. The server shows the following in dmesg: Unable to handle kernel paging request at virtual address 19191959 printing eip: f8982489 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: gnbd lock_dlm dlm cman gfs lock_harness dm_mod CPU: 0 EIP: 0060:[] Not tainted EFLAGS: 00010282 (2.6.7) EIP is at name_to_directory_nodeid+0x15/0xf9 [dlm] eax: 19191919 ebx: f1d02cd4 ecx: 00000000 edx: f1d02cc4 esi: 00000000 edi: 19191919 ebp: f1d02cc4 esp: f74d3df4 ds: 007b es: 007b ss: 0068 Process dlm_recoverd (pid: 641, threadinfo=f74d2000 task=f4a00940) Stack: 00000000 00000025 00000001 c01ed6dd c01ede1c 00000000 f1d02cd4 00000000 dc8260d4 f1d02cc4 f898258e 19191919 f1d02d3d 00000000 f8982b51 f1d02cc4 f74d3e58 00000008 c04010a0 00000246 00000001 c18d7ccc dc8260bc c18d7cd4 Call Trace: [] scrup+0x13b/0x14f [] complement_pos+0x20/0x183 [] get_directory_nodeid+0x21/0x25 [dlm] [] dlm_dir_rebuild_send+0xec/0x27d [dlm] [] rcom_process_message+0x2d6/0x558 [dlm] [] rcom_send_message+0x1c5/0x217 [dlm] [] dlm_dir_rebuild_local+0x12b/0x29c [dlm] [] ls_reconfig+0x79/0x292 [dlm] [] do_ls_recovery+0x166/0x436 [dlm] [] dlm_recoverd+0x143/0x16e [dlm] [] default_wake_function+0x0/0x12 [] ret_from_fork+0x6/0x14 [] default_wake_function+0x0/0x12 [] dlm_recoverd+0x0/0x16e [dlm] [] kernel_thread_helper+0x5/0xb Code: 83 7f 40 01 74 65 8b 44 24 34 89 44 24 04 8b 44 24 30 89 04 From czoffoli at xmerlin.org Wed Jul 21 00:00:28 2004 From: czoffoli at xmerlin.org (Christian Zoffoli) Date: Wed, 21 Jul 2004 02:00:28 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <20040720171235.GG23619@phlogiston.msp.redhat.com> References: <40FBEB78.8040305@studiobz.it> <20040720171235.GG23619@phlogiston.msp.redhat.com> Message-ID: <40FDB21C.1030401@xmerlin.org> Benjamin Marzinski wrote: [cut] > If you do not want to enable multipathing or run GFS on the gnbd server, > you can just add a -c to your export line. ...I need multipathing ;) ...with -c it works [cut] > > # cpt null > > if you get something like > > Connect failure: Operation not permitted Yes, I get a message like this one. > then either cman isn't running on the node, or magma cannot connect to it. > If cman is running correctly (check you logs) Then look in > /usr/lib/magma/plugins. You should have a sm.so file there. If not, you need > to install the magma plugins, which are located in /cluster/magma-plugins. cman is running and I have a sm.so plugin Christian From zhuyfa at lenovo.com Thu Jul 22 05:56:37 2004 From: zhuyfa at lenovo.com (zhuyfa at lenovo.com) Date: Thu, 22 Jul 2004 13:56:37 +0800 Subject: [Linux-cluster] Does Gfs only run in kernel 2.6.7 ? (all) Message-ID: *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* ????(??)???? ???????????? ???(walkinair) 13810259866 010-58864076 zhuyfa at lenovo.com ?????????6? ??8688?? 100085 *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* ???????????,??????????; ??,?????! ?????????,???????????; ??,?????!! ???????????,???????; ??,????!!! _________________________________________________ From lserinol at gmail.com Thu Jul 22 08:23:33 2004 From: lserinol at gmail.com (Levent Serinol) Date: Thu, 22 Jul 2004 11:23:33 +0300 Subject: [Linux-cluster] GFS maximum filesystem size ? Message-ID: <2c1942a7040722012362fb3810@mail.gmail.com> Hi, what is the maximum GFS filesystem size ? AFAIK, documents in redhat.com site says that this limit is 2TB. Also, it says that bigger filesystem size support will be available when RHEL kernel migrated to to 2.6. anybody here knows when rhel kernel 2.6 will be released ? thanks, From john.hearns at clustervision.com Thu Jul 22 08:31:24 2004 From: john.hearns at clustervision.com (John Hearns) Date: Thu, 22 Jul 2004 09:31:24 +0100 Subject: [Linux-cluster] GFS maximum filesystem size ? In-Reply-To: <2c1942a7040722012362fb3810@mail.gmail.com> References: <2c1942a7040722012362fb3810@mail.gmail.com> Message-ID: <1090485084.6205.7.camel@vigor12> On Thu, 2004-07-22 at 09:23, Levent Serinol wrote: > Hi, > > what is the maximum GFS filesystem size ? AFAIK, documents in > redhat.com site says that this limit is 2TB. Also, it says that > bigger filesystem size support will be available when RHEL kernel > migrated to to 2.6. > anybody here knows when rhel kernel 2.6 will be released ? This is of course best answered by someone from Redhat. But if you want to work with 2.6 now, why not install an R+D system running Fedora 2? RPMs available from: http://www2.wantstofly.org/gfs/ That's the concept of Fedora - a faster release cycle, so you can try out leading edge features, and of course a closer relationship with the community. This leaves RHEL to be more stable, less frequent release cycle,with a much longer time till End of Life. I think it s fair bet that RHEL 4 will have 2.6. From julien.senon at toulouse.inra.fr Thu Jul 22 09:16:05 2004 From: julien.senon at toulouse.inra.fr (Julien Senon) Date: Thu, 22 Jul 2004 11:16:05 +0200 Subject: [Linux-cluster] Problem with GFS Message-ID: <40FF85D5.7090804@toulouse.inra.fr> Hi, I am a problem with GFS : When I execute this command : "ls" in a directory which are mounted by GFS, and which contains 100 sub-directory, I wait 20min after the command line was type. Who are the problem ? and What is the atime ? Thank you for yours response. Julien Senon -- Julien Senon - julien.senon at toulouse.inra.fr - INRA - G??nopole plateforme Bioinformatique - Unit?? de Biom??trie et Intelligence Artificielle BP 27 - 31326 Castanet Tolosan cedex From lserinol at gmail.com Thu Jul 22 09:53:48 2004 From: lserinol at gmail.com (Levent Serinol) Date: Thu, 22 Jul 2004 12:53:48 +0300 Subject: [Linux-cluster] lock_gulm is very slow. why ? Message-ID: <2c1942a704072202534d487950@mail.gmail.com> Hi, I have done some benchmark tests with postmark(tests repeated many times). There is one client (also it is lock server). and another one which exports it's scsi hard disk with gnbd. filesystem created with lock_gulm: ----------------------------------------- Time: 94 seconds total 7 seconds of transactions (142 per second) Files: 10692 created (113 per second) Creation alone: 10000 files (434 per second) Mixed with transactions: 692 files (98 per second) 899 read (128 per second) 101 appended (14 per second) 10692 deleted (113 per second) Deletion alone: 10384 files (162 per second) Mixed with transactions: 308 files (44 per second) Data: 21.05 megabytes read (229.28 kilobytes per second) 250.41 megabytes written (2.66 megabytes per second) filesystem created with no_lock: -------------------------------------- Time: 35 seconds total 4 seconds of transactions (250 per second) Files: 10692 created (305 per second) Creation alone: 10000 files (454 per second) Mixed with transactions: 692 files (173 per second) 899 read (224 per second) 101 appended (25 per second) 10692 deleted (305 per second) Deletion alone: 10384 files (1153 per second) Mixed with transactions: 308 files (77 per second) Data: 21.05 megabytes read (615.77 kilobytes per second) 250.41 megabytes written (7.15 megabytes per second) as you can see nolock results is 2 times (some parts 3 times) faster then with locked one . what could be the problem ? is there any workaround or settune option (releasing locks earlier,etc...) ? From lists at wikidev.net Thu Jul 22 10:20:46 2004 From: lists at wikidev.net (Gabriel Wicke) Date: Thu, 22 Jul 2004 12:20:46 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <40FDB21C.1030401@xmerlin.org> References: <40FBEB78.8040305@studiobz.it> <20040720171235.GG23619@phlogiston.msp.redhat.com> <40FDB21C.1030401@xmerlin.org> Message-ID: <1090491646.1306.19.camel@venus> On Wed, 2004-07-21 at 02:00 +0200, Christian Zoffoli wrote: > Benjamin Marzinski wrote: > [cut] > > If you do not want to enable multipathing or run GFS on the gnbd server, > > you can just add a -c to your export line. > > ...I need multipathing ;) ...with -c it works > > > [cut] > > > > # cpt null > > > > if you get something like > > > > Connect failure: Operation not permitted > > Yes, I get a message like this one. > > > then either cman isn't running on the node, or magma cannot connect to it. > > If cman is running correctly (check you logs) Then look in > > /usr/lib/magma/plugins. You should have a sm.so file there. If not, you need > > to install the magma plugins, which are located in /cluster/magma-plugins. > > cman is running and I have a sm.so plugin On my system (debian unstable) it expects the plugin folder in /lib/magma/plugins, you could add a symlink and see if it works. Else you can run a test program from the magma source dir, magma/tests/cpt null. Stracing this will show you the place it's looking for (will show an ENOENT near the end of the strace). The reason for this problem seems to be the usage of $libdir in the magma-plugins makefiles or somesuch. -- Gabriel Wicke From arekm at pld-linux.org Thu Jul 22 11:05:17 2004 From: arekm at pld-linux.org (Arkadiusz Miskiewicz) Date: Thu, 22 Jul 2004 13:05:17 +0200 Subject: [Linux-cluster] GFS maximum filesystem size ? In-Reply-To: <1090485084.6205.7.camel@vigor12> References: <2c1942a7040722012362fb3810@mail.gmail.com> <1090485084.6205.7.camel@vigor12> Message-ID: <200407221305.17025.arekm@pld-linux.org> On Thursday 22 of July 2004 10:31, John Hearns wrote: > But if you want to work with 2.6 now, why not install an R+D system > running Fedora 2? > RPMs available from: http://www2.wantstofly.org/gfs/ None of gfs related rpms I have seen so far doesn't include system integration scripts like initscripts etc. Does anyone have nice conception or better working scripts to fully integrate GFS with system like Fedora? -- Arkadiusz Mi?kiewicz CS at FoE, Wroclaw University of Technology arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux From mtilstra at redhat.com Thu Jul 22 14:53:45 2004 From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra) Date: Thu, 22 Jul 2004 09:53:45 -0500 Subject: [Linux-cluster] lock_gulm is very slow. why ? In-Reply-To: <2c1942a704072202534d487950@mail.gmail.com> References: <2c1942a704072202534d487950@mail.gmail.com> Message-ID: <20040722145345.GA22628@redhat.com> On Thu, Jul 22, 2004 at 12:53:48PM +0300, Levent Serinol wrote: > Hi, > I have done some benchmark tests with postmark(tests repeated many > times). There is one client (also it is lock server). and another one > which exports it's scsi hard disk with gnbd. [snipped a lot of nice data] > as you can see nolock results is 2 times (some parts 3 times) faster > then with locked one . > what could be the problem ? is there any workaround or settune option > (releasing locks earlier,etc...) ? the biggest thing you are probably running into is that when running with lock_nolock, gfs knows that it is not in a cluster, therefor it can enable some optimisations that only work for lcoal filesystems. These optimisations would corrupt disk data if you had multiple nodes mounted. There is also no network traffic for handling lock in lock_nolock, but that is minor compaired to the local file system optimisations. Basically, gfs with lock_nolock should always be quite faster than with any cluster locking (lock_gulm or lock_dlm). Ken could say more on this. -- Michael Conrad Tadpol Tilstra Reality is for people who lack imagination. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From kpreslan at redhat.com Thu Jul 22 14:58:37 2004 From: kpreslan at redhat.com (Ken Preslan) Date: Thu, 22 Jul 2004 09:58:37 -0500 Subject: [Linux-cluster] lock_gulm is very slow. why ? In-Reply-To: <20040722145345.GA22628@redhat.com> References: <2c1942a704072202534d487950@mail.gmail.com> <20040722145345.GA22628@redhat.com> Message-ID: <20040722145837.GA29470@potassium.msp.redhat.com> On Thu, Jul 22, 2004 at 09:53:45AM -0500, Michael Conrad Tadpol Tilstra wrote: > On Thu, Jul 22, 2004 at 12:53:48PM +0300, Levent Serinol wrote: > > Hi, > > I have done some benchmark tests with postmark(tests repeated many > > times). There is one client (also it is lock server). and another one > > which exports it's scsi hard disk with gnbd. > [snipped a lot of nice data] > > as you can see nolock results is 2 times (some parts 3 times) faster > > then with locked one . > > what could be the problem ? is there any workaround or settune option > > (releasing locks earlier,etc...) ? > > the biggest thing you are probably running into is that when running > with lock_nolock, gfs knows that it is not in a cluster, therefor it can > enable some optimisations that only work for lcoal filesystems. These > optimisations would corrupt disk data if you had multiple nodes mounted. You can turn off those optimizations with lock_nolock by mounting with "-o ignore_local_fs". That will let us figure out what is optimizations and what is lock latency. > There is also no network traffic for handling lock in lock_nolock, but > that is minor compaired to the local file system optimisations. > > Basically, gfs with lock_nolock should always be quite faster than with > any cluster locking (lock_gulm or lock_dlm). > > Ken could say more on this. > > -- > Michael Conrad Tadpol Tilstra > Reality is for people who lack imagination. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Ken Preslan From amir at datacore.ch Thu Jul 22 15:05:59 2004 From: amir at datacore.ch (Amir Guindehi) Date: Thu, 22 Jul 2004 17:05:59 +0200 Subject: [Linux-cluster] GFS maximum filesystem size ? In-Reply-To: <200407221305.17025.arekm@pld-linux.org> References: <2c1942a7040722012362fb3810@mail.gmail.com> <1090485084.6205.7.camel@vigor12> <200407221305.17025.arekm@pld-linux.org> Message-ID: <40FFD7D7.1020702@datacore.ch> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Arkadiusz, | None of gfs related rpms I have seen so far doesn't include system integration | scripts like initscripts etc. | | Does anyone have nice conception or better working scripts to fully integrate | GFS with system like Fedora? I wrote init scripts for my GenToo system. You can find them inside of the Ebuilds i published in the GFS section at: https://open.datacore.ch/page/GFS.Install The scripts are able to start the cluster, join the fence domain, start gnbd inport/export and finally to mount the GFS filesystem automatically. Regards - - Amir - -- Amir Guindehi, nospam.amir at datacore.ch DataCore GmbH, Witikonerstrasse 289, 8053 Zurich, Switzerland -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-nr1 (Windows 2000) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFA/9fVbycOjskSVCwRAhY7AKCfQUWBFgEsl7RbNr0qHiQ6kd8NWgCgmzdU 23vcTSkgfu+/0/c0VCi6pmI= =eMYj -----END PGP SIGNATURE----- From jbrassow at redhat.com Thu Jul 22 15:28:19 2004 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Thu, 22 Jul 2004 10:28:19 -0500 Subject: [Linux-cluster] Does Gfs only run in kernel 2.6.7 ? (all) In-Reply-To: References: Message-ID: GFS 6.0.0 runs on the 2.4 kernel - look at ftp.redhat.com/pub/redhat/linux/updates/enterprise/3AS/en/RHGFS/SRPMS GFS (cvs/devel) runs on the 2.6 kernel - look at sources.redhat.com/cluster brassow On Jul 22, 2004, at 12:56 AM, zhuyfa at lenovo.com wrote: > > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > ????(??)???? ???????????? > ???(walkinair) 13810259866 > 010-58864076 zhuyfa at lenovo.com > ?????????6? ??8688?? 100085 > *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* > ???????????,??????????; > ??,?????! > ?????????,???????????; > ??,?????!! > ???????????,???????; > ??,????!!! > _________________________________________________ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1920 bytes Desc: not available URL: From jbrassow at redhat.com Thu Jul 22 15:31:24 2004 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Thu, 22 Jul 2004 10:31:24 -0500 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <1090491646.1306.19.camel@venus> References: <40FBEB78.8040305@studiobz.it> <20040720171235.GG23619@phlogiston.msp.redhat.com> <40FDB21C.1030401@xmerlin.org> <1090491646.1306.19.camel@venus> Message-ID: <30CBEEAA-DBF4-11D8-B716-000A957BB1F6@redhat.com> On Jul 22, 2004, at 5:20 AM, Gabriel Wicke wrote: > On Wed, 2004-07-21 at 02:00 +0200, Christian Zoffoli wrote: >> Benjamin Marzinski wrote: >> [cut] >>> If you do not want to enable multipathing or run GFS on the gnbd >>> server, >>> you can just add a -c to your export line. >> >> ...I need multipathing ;) ...with -c it works >> >> >> [cut] >>> >>> # cpt null >>> >>> if you get something like >>> >>> Connect failure: Operation not permitted >> >> Yes, I get a message like this one. >> >>> then either cman isn't running on the node, or magma cannot connect >>> to it. >>> If cman is running correctly (check you logs) Then look in >>> /usr/lib/magma/plugins. You should have a sm.so file there. If not, >>> you need >>> to install the magma plugins, which are located in >>> /cluster/magma-plugins. >> >> cman is running and I have a sm.so plugin > > > On my system (debian unstable) it expects the plugin folder > in /lib/magma/plugins, you could add a symlink and see if it works. > Else > you can run a test program from the magma source dir, magma/tests/cpt > null. Stracing this will show you the place it's looking for (will show > an ENOENT near the end of the strace). > The reason for this problem seems to be the usage of $libdir in the > magma-plugins makefiles or somesuch. > I noticed the configure scripts are not always consistent WRT %{libdir}. I believe this may be causing some of the confusion... As stated, the symlinks will work, but I intend to correct the configure scripts in cvs soon. brassow From danderso at redhat.com Thu Jul 22 15:38:23 2004 From: danderso at redhat.com (Derek Anderson) Date: Thu, 22 Jul 2004 10:38:23 -0500 Subject: [Linux-cluster] Quotas In-Reply-To: <1090449037.22972.157.camel@akira.nro.au.com> References: <1090449037.22972.157.camel@akira.nro.au.com> Message-ID: <200407221038.23945.danderso@redhat.com> I am not seeing this on fedora core 2. [root at link-11 /]# mount /dev/hda2 on / type ext3 (rw) none on /proc type proc (rw) none on /sys type sysfs (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) usbdevfs on /proc/bus/usb type usbdevfs (rw) /dev/hda1 on /boot type ext3 (rw) none on /dev/shm type tmpfs (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/sda1 on /data1 type gfs (rw) [root at link-11 /]# gfs_quota init -f /data1 [root at link-11 /]# gfs_quota list -f /data1 user root: limit: 880.0 warn: 870.0 value: 98.1 user bin: limit: 890.0 warn: 880.0 value: 0.0 user daemon: limit: 900.0 warn: 890.0 value: 0.0 user adm: limit: 910.0 warn: 900.0 value: 0.0 user lp: limit: 920.0 warn: 910.0 value: 0.0 user sync: limit: 930.0 warn: 920.0 value: 0.0 user shutdown: limit: 940.0 warn: 930.0 value: 0.0 user halt: limit: 950.0 warn: 940.0 value: 0.0 user mail: limit: 960.0 warn: 950.0 value: 0.0 user news: limit: 970.0 warn: 960.0 value: 0.0 user uucp: limit: 980.0 warn: 970.0 value: 0.0 user operator: limit: 990.0 warn: 980.0 value: 0.0 . . . group root: limit: 12600.0 warn: 12500.0 value: 98.1 group bin: limit: 12700.0 warn: 12600.0 value: 0.0 group daemon: limit: 12800.0 warn: 12700.0 value: 0.0 group sys: limit: 12900.0 warn: 12800.0 value: 0.0 group adm: limit: 13000.0 warn: 12900.0 value: 0.0 group tty: limit: 13100.0 warn: 13000.0 value: 0.0 group disk: limit: 13200.0 warn: 13100.0 value: 0.0 group lp: limit: 13300.0 warn: 13200.0 value: 0.0 group mem: limit: 13400.0 warn: 13300.0 value: 0.0 group kmem: limit: 13500.0 warn: 13400.0 value: 0.0 . . . [root at link-11 /]# gfs_quota warn -u bin -l 6666 -f /data1 [root at link-11 /]# gfs_quota list -f /data1 user root: limit: 880.0 warn: 870.0 value: 98.1 user bin: limit: 890.0 warn: 6666.0 value: 0.0 . . . On Wednesday 21 July 2004 17:30, Adam Cassar wrote: > Hi Guys, > > I've got GFS running in a two node set up on kernel 2.6.7 on debian > stable. > > I can mount both partitions and normal file access seems fine. However > any quota related commands just hang: ie > > ./gfs_quota init -f /mnt > > just sits there and is unkillable. Below are some interesting lines from > ps: > > 5 0 621 1 16 0 4312 1080 - Ss ? 0:00 > ./ccsd > 5 0 622 621 16 0 4312 1080 - S ? 0:00 > ./ccsd > 5 0 623 622 16 0 4312 1080 414395 S ? 0:03 > ./ccsd > 1 0 625 1 9 -6 0 0 cluste S< ? 0:00 > [cman_comms] > 5 0 626 1 10 -5 0 0 member S< ? 0:00 > [cman_memb] > 1 0 627 1 15 0 0 0 servic S ? 0:00 > [cman_serviced] > 1 0 628 1 9 -6 0 0 hello_ S< ? 0:00 > [cman_hbeat] > 5 0 631 1 18 0 1344 484 pause Ss ? 0:00 > fenced > 1 0 632 1 19 0 0 0 kcl_jo D ? 0:00 > [cman_userjoin] > 1 0 641 1 15 0 0 0 dlm_re S ? 0:00 > [dlm_recoverd] > 1 0 642 1 15 0 0 0 dlm_as S ? 0:30 > [dlm_astd] > 1 0 643 1 15 0 0 0 dlm_re S ? 0:13 > [dlm_recvd] > 1 0 644 1 15 0 0 0 dlm_se S ? 0:10 > [dlm_sendd] > 1 0 645 1 15 0 0 0 dlm_as S ? 0:18 > [lock_dlm] > 1 0 646 1 15 0 0 0 dlm_as S ? 0:20 > [lock_dlm] > 1 0 647 1 15 0 0 0 - S ? 0:06 > [gfs_scand] > 1 0 648 1 15 0 0 0 gfs_gl S ? 0:05 > [gfs_glockd] > 1 0 649 1 15 0 0 0 - S ? 0:00 > [gfs_recoverd] > 1 0 650 1 15 0 0 0 - S ? 0:00 > [gfs_logd] > 1 0 651 1 15 0 0 0 - S ? 0:00 > [gfs_quotad] > 1 0 652 1 15 0 0 0 - S ? 0:00 > [gfs_inoded] > 5 0 1080 286 17 0 5704 1692 - S ? 0:00 > /usr/sbin/sshd > 5 1002 1082 1080 16 0 5716 1784 - S ? 0:00 > /usr/sbin/sshd > 0 1002 1083 1082 17 0 2208 1212 wait4 Ss pts/1 0:00 -bash > 4 0 1087 1083 15 0 2288 1308 wait4 S pts/1 0:00 -su > 4 0 1070 609 18 0 1284 400 - R+ pts/0 0:28 > ./gfs_quota init -f /mnt > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From amanthei at redhat.com Thu Jul 22 15:36:41 2004 From: amanthei at redhat.com (Adam Manthei) Date: Thu, 22 Jul 2004 10:36:41 -0500 Subject: [Linux-cluster] Quotas In-Reply-To: <200407221038.23945.danderso@redhat.com> References: <1090449037.22972.157.camel@akira.nro.au.com> <200407221038.23945.danderso@redhat.com> Message-ID: <20040722153641.GI17867@redhat.com> On Thu, Jul 22, 2004 at 10:38:23AM -0500, Derek Anderson wrote: > I am not seeing this on fedora core 2. > On Wednesday 21 July 2004 17:30, Adam Cassar wrote: > > Hi Guys, > > > > I've got GFS running in a two node set up on kernel 2.6.7 on debian > > stable. > > > > I can mount both partitions and normal file access seems fine. However > > any quota related commands just hang: ie > > > > ./gfs_quota init -f /mnt > > > > just sits there and is unkillable. Below are some interesting lines from > > ps: > > > > 1 0 632 1 19 0 0 0 kcl_jo D ? 0:00 > > [cman_userjoin] Could this be the problem? ^^^^^^^^^ -- Adam Manthei From lserinol at gmail.com Thu Jul 22 17:58:24 2004 From: lserinol at gmail.com (Levent Serinol) Date: Thu, 22 Jul 2004 20:58:24 +0300 Subject: [Linux-cluster] lock_gulm is very slow. why ? In-Reply-To: <20040722145837.GA29470@potassium.msp.redhat.com> References: <2c1942a704072202534d487950@mail.gmail.com> <20040722145345.GA22628@redhat.com> <20040722145837.GA29470@potassium.msp.redhat.com> Message-ID: <2c1942a704072210583051ca1e@mail.gmail.com> here is the result with ignore_local_fs option: Time: 81 seconds total 6 seconds of transactions (166 per second) Files: 10692 created (132 per second) Creation alone: 10000 files (434 per second) Mixed with transactions: 692 files (115 per second) 899 read (149 per second) 101 appended (16 per second) 10692 deleted (132 per second) Deletion alone: 10384 files (199 per second) Mixed with transactions: 308 files (51 per second) Data: 21.05 megabytes read (266.07 kilobytes per second) 250.41 megabytes written (3.09 megabytes per second) On Thu, 22 Jul 2004 09:58:37 -0500, Ken Preslan wrote: > On Thu, Jul 22, 2004 at 09:53:45AM -0500, Michael Conrad Tadpol Tilstra wrote: > > On Thu, Jul 22, 2004 at 12:53:48PM +0300, Levent Serinol wrote: > > > Hi, > > > I have done some benchmark tests with postmark(tests repeated many > > > times). There is one client (also it is lock server). and another one > > > which exports it's scsi hard disk with gnbd. > > [snipped a lot of nice data] > > > as you can see nolock results is 2 times (some parts 3 times) faster > > > then with locked one . > > > what could be the problem ? is there any workaround or settune option > > > (releasing locks earlier,etc...) ? > > > > the biggest thing you are probably running into is that when running > > with lock_nolock, gfs knows that it is not in a cluster, therefor it can > > enable some optimisations that only work for lcoal filesystems. These > > optimisations would corrupt disk data if you had multiple nodes mounted. > > You can turn off those optimizations with lock_nolock by mounting with > "-o ignore_local_fs". That will let us figure out what is optimizations > and what is lock latency. > > > There is also no network traffic for handling lock in lock_nolock, but > > that is minor compaired to the local file system optimisations. > > > > Basically, gfs with lock_nolock should always be quite faster than with > > any cluster locking (lock_gulm or lock_dlm). > > > > Ken could say more on this. > > > > -- > > Michael Conrad Tadpol Tilstra > > Reality is for people who lack imagination. > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > http://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Ken Preslan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > -- -- Stay out of the road, if you want to grow old. ~ Pink Floyd ~. From merlin at studiobz.it Thu Jul 22 19:20:19 2004 From: merlin at studiobz.it (Christian Zoffoli) Date: Thu, 22 Jul 2004 21:20:19 +0200 Subject: [Linux-cluster] GNBD: cannot connect to cluster manager ...Operation not permitted In-Reply-To: <1090491646.1306.19.camel@venus> References: <40FBEB78.8040305@studiobz.it> <20040720171235.GG23619@phlogiston.msp.redhat.com> <40FDB21C.1030401@xmerlin.org> <1090491646.1306.19.camel@venus> Message-ID: <41001373.9070001@studiobz.it> Gabriel Wicke wrote: [cut] > > On my system (debian unstable) it expects the plugin folder > in /lib/magma/plugins, you could add a symlink and see if it works. Else > you can run a test program from the magma source dir, magma/tests/cpt > null. Stracing this will show you the place it's looking for (will show > an ENOENT near the end of the strace). > The reason for this problem seems to be the usage of $libdir in the > magma-plugins makefiles or somesuch. You are right, the problem is the path ...now it works. Thank you very much. Christian From stephen.willey at framestore-cfc.com Fri Jul 23 10:53:46 2004 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Fri, 23 Jul 2004 11:53:46 +0100 Subject: [Linux-cluster] GFS is *very* slow when NFS exported Message-ID: <4100EE3A.6050403@framestore-cfc.com> We are looking at using GFS for load balanced NFS. Going with GNBD exported GFS isn't really an option since we're talking about over 1000 machines needing to access the storage. For this reason we were looking at providing a relatively small cluster of GFS machines serving the storage via NFS. The clients would then use round-robin DNS to load balance across these servers. The results we've got back from our tests are shown below: -= Setup ========================================- Two machines with GFS filesystem mounted from a dual-port RAID, connected to ethernet via GigE and serving NFS as follows: /mnt/gfs *(no_root_squash,rw,insecure,async) -= End of Setup =================================- -= Local GFS Filesystem Access ==================- The tests show Mb/s computed by using 10Gb dd 1 machine, write (gfstest1): 166 1 machine, read (gfstest2 reading file gfstest1 created): 139.5 2 machines, sim writes (different files): 113.7 (gfstest1) 108 (gfstest2) - 221.7aggr 2 machines, sim reads (different files): 101.5 (gfstest1) 101.6 (gfstest2) - 203aggr 2 machines, sim reads (same file): 130 (gfstest1) 134 (gfstest2) - 264aggr -= End of Local GFS Filesystem Access ===========- -= NFS/GFS Access ===============================- 1 write: (client1 to gfstest1) 42.3 1 read: (client1 from gfstest1) Varies enormously between 10-35Mb/s) Simultaneous writes and reads done as follows: client1 NFS mounting gfstest1 clients2 NFS mounting gfstest2 2 sim writes (different files): 39.6 (client1) 42.4 (client2) - 82aggr 2 sim reads (different files): 11.3 (client1) 11.2 (client2) - 22.5aggr -= End of NFS/GFS Access ========================- -= NFS/XFS Access ===============================- Done for comparison of XFS & GFS export speeds 1 write: 65.2 1 read: 73.5 -= End of NFS/XFS Access ========================- We know NFS isn't the highest performing thing in the world, but it's a concern that the NFS performance of a GFS mounted filesystem is so much lower than that of an XFS system. There are of course, the clustering overheads, but this would affect local performance as well. Anyone got any ideas as to why this might be and how to get more performance? Thanks, Stephen From linux-cluster-rhn at chaj.com Fri Jul 23 19:54:51 2004 From: linux-cluster-rhn at chaj.com (linux-cluster-rhn at chaj.com) Date: Fri, 23 Jul 2004 15:54:51 -0400 (EDT) Subject: [Linux-cluster] compilation woes Message-ID: Here is the output of my compile attempt of the cluster dir (dl'd) via CVS. I compiled the LVM and GFS stuff into my patched 2.6.7 kernel tree. Any ideas? Thanks. [root at live1 cluster]# make cd cman-kernel && make all make[1]: Entering directory `/content/src/gfs/cluster/cman-kernel' cd src && make all make[2]: Entering directory `/content/src/gfs/cluster/cman-kernel/src' rm -f cluster ln -s . cluster make -C /usr/src/linux-2.6.7 M=/content/src/gfs/cluster/cman-kernel/src modules USING_KBUILD=yes make[3]: Entering directory `/content/src/linux-2.6.7' Building modules, stage 2. MODPOST *** Warning: "sigprocmask" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "release_sock" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "kmem_cache_destroy" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "__kmalloc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_init_data" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "__kfree_skb" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "vmalloc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "del_timer" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "seq_open" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "malloc_sizes" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "remove_wait_queue" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_release" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "simple_strtoul" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_recvmsg" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "seq_printf" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "remove_proc_entry" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "skb_recv_datagram" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_create_kern" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "vfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_rfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sprintf" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "seq_read" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "jiffies" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "__write_lock_failed" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_no_sendpage" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_no_mmap" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "default_wake_function" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "wait_for_completion" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_no_socketpair" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "proc_mkdir" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sk_alloc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "printk" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "alloc_skb" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_sendmsg" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "panic" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "copy_to_user" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_no_listen" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_no_accept" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "strstr" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sk_free" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "mod_timer" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "fput" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "lock_sock" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "skb_over_panic" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "skb_queue_tail" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "kmem_cache_alloc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "memcpy_toiovec" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "system_utsname" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "datagram_poll" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_register" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "schedule" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "schedule_timeout" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "local_bh_enable" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "create_proc_entry" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "put_cmsg" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "wake_up_process" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "kmem_cache_create" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "vsnprintf" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "__wake_up" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "net_ratelimit" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_no_connect" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "do_gettimeofday" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "add_wait_queue" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "seq_lseek" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sk_run_filter" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "kfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "kill_proc" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "memcpy" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "___pskb_trim" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "sock_unregister" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "memcpy_fromiovec" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "set_user_nice" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "fget" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "kernel_thread" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "__up_wakeup" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "complete" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "snprintf" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "seq_release" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "__down_failed" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "copy_from_user" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "daemonize" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! *** Warning: "skb_free_datagram" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! CC /content/src/gfs/cluster/cman-kernel/src/cman.mod.o /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:9: error: variable `__this_module' has initializer but incomplete type /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: error: unknown field `name' specified in initializer /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: warning: excess elements in struct initializer /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: warning: (near initialization for `__this_module') /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: error: unknown field `init' specified in initializer /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: warning: excess elements in struct initializer /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: warning: (near initialization for `__this_module') /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:9: error: storage size of `__this_module' isn't known make[4]: *** [/content/src/gfs/cluster/cman-kernel/src/cman.mod.o] Error 1 make[3]: *** [modules] Error 2 make[3]: Leaving directory `/content/src/linux-2.6.7' make[2]: *** [all] Error 2 make[2]: Leaving directory `/content/src/gfs/cluster/cman-kernel/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/content/src/gfs/cluster/cman-kernel' make: *** [all] Error 2 From jbrassow at redhat.com Fri Jul 23 20:56:24 2004 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Fri, 23 Jul 2004 15:56:24 -0500 Subject: [Linux-cluster] compilation woes In-Reply-To: References: Message-ID: looks like it can't find the kernel... ? ... Is your kernel src in /usr/src/linux-2.6 ? If not, try: > ./configure --kernel_src= > make install brassow On Jul 23, 2004, at 2:54 PM, linux-cluster-rhn at chaj.com wrote: > > Here is the output of my compile attempt of the cluster dir (dl'd) via > CVS. I > compiled the LVM and GFS stuff into my patched 2.6.7 kernel tree. Any > ideas? > Thanks. > > [root at live1 cluster]# make > cd cman-kernel && make all > make[1]: Entering directory `/content/src/gfs/cluster/cman-kernel' > cd src && make all > make[2]: Entering directory `/content/src/gfs/cluster/cman-kernel/src' > rm -f cluster > ln -s . cluster > make -C /usr/src/linux-2.6.7 M=/content/src/gfs/cluster/cman-kernel/src > modules USING_KBUILD=yes > make[3]: Entering directory `/content/src/linux-2.6.7' > Building modules, stage 2. > MODPOST > *** Warning: "sigprocmask" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "release_sock" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "kmem_cache_destroy" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "__kmalloc" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sock_init_data" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "__kfree_skb" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "vmalloc" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "del_timer" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "seq_open" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "malloc_sizes" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "remove_wait_queue" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_release" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "simple_strtoul" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_recvmsg" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "seq_printf" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "remove_proc_entry" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "skb_recv_datagram" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_create_kern" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "vfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sock_rfree" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sprintf" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "seq_read" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "jiffies" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "__write_lock_failed" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_no_sendpage" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_no_mmap" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "default_wake_function" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "wait_for_completion" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_no_socketpair" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "proc_mkdir" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sk_alloc" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "printk" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "alloc_skb" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sock_sendmsg" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "panic" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "copy_to_user" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sock_no_listen" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_no_accept" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "strstr" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sk_free" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "mod_timer" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "fput" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "lock_sock" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "skb_over_panic" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "skb_queue_tail" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "kmem_cache_alloc" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "memcpy_toiovec" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "system_utsname" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "datagram_poll" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_register" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "schedule" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "schedule_timeout" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "local_bh_enable" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "create_proc_entry" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "put_cmsg" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "wake_up_process" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "kmem_cache_create" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "vsnprintf" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "__wake_up" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "net_ratelimit" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "sock_no_connect" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "do_gettimeofday" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "add_wait_queue" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "seq_lseek" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sk_run_filter" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "kfree" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "kill_proc" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "memcpy" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "___pskb_trim" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "sock_unregister" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "memcpy_fromiovec" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "set_user_nice" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "fget" [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "kernel_thread" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "__up_wakeup" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "complete" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "snprintf" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "seq_release" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "__down_failed" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "copy_from_user" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > *** Warning: "daemonize" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] > undefined! > *** Warning: "skb_free_datagram" > [/content/src/gfs/cluster/cman-kernel/src/cman.ko] undefined! > CC /content/src/gfs/cluster/cman-kernel/src/cman.mod.o > /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:9: error: variable > `__this_module' has initializer but incomplete type > /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: error: unknown > field > `name' specified in initializer > /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: warning: excess > elements in struct initializer > /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:10: warning: (near > initialization for `__this_module') > /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: error: unknown > field > `init' specified in initializer > /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: warning: excess > elements in struct initializer > /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:11: warning: (near > initialization for `__this_module') > /content/src/gfs/cluster/cman-kernel/src/cman.mod.c:9: error: storage > size of > `__this_module' isn't known > make[4]: *** [/content/src/gfs/cluster/cman-kernel/src/cman.mod.o] > Error 1 > make[3]: *** [modules] Error 2 > make[3]: Leaving directory `/content/src/linux-2.6.7' > make[2]: *** [all] Error 2 > make[2]: Leaving directory `/content/src/gfs/cluster/cman-kernel/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/content/src/gfs/cluster/cman-kernel' > make: *** [all] Error 2 > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > From lhh at redhat.com Mon Jul 26 15:29:24 2004 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 26 Jul 2004 11:29:24 -0400 Subject: [Linux-cluster] Removal of message header cruft from magma/message.c Message-ID: <1090855764.4427.28.camel@dhcp83-21.boston.redhat.com> Jon Brassow noted that this wasn't necessary. He's right; it isn't anymore. ;) -- Lon Index: lib/Makefile =================================================================== RCS file: /cvs/cluster/cluster/magma/lib/Makefile,v retrieving revision 1.2 diff -u -r1.2 Makefile --- lib/Makefile 1 Jul 2004 13:35:46 -0000 1.2 +++ lib/Makefile 26 Jul 2004 15:22:31 -0000 @@ -72,7 +72,7 @@ memberlist.o clist.o ${AR} cr $@ $^ -libmagmamsg.a: message.o crc32.o fdops.o +libmagmamsg.a: message.o fdops.o ${AR} cr $@ $^ %.o: %.c Index: lib/message.c =================================================================== RCS file: /cvs/cluster/cluster/magma/lib/message.c,v retrieving revision 1.3 diff -u -r1.3 message.c --- lib/message.c 1 Jul 2004 13:35:46 -0000 1.3 +++ lib/message.c 26 Jul 2004 15:22:31 -0000 @@ -49,8 +49,6 @@ #define IPV6_PORT_OFFSET 1 -int clu_crc32(void *, int); - /* From fdops.c */ @@ -80,62 +78,6 @@ static pthread_mutex_t fill_mutex = PTHREAD_MUTEX_INITIALIZER; -struct __attribute__ ((packed)) msg_struct { - uint32_t ms_count; /* number of bytes in payload */ - uint32_t ms_crc32; /* CRC32 of data */ -}; - - -/** - Create a message buffer with a header including length and data CRC. - - @param payload data to send - @param len length of message to add header to - @param msg allocated within: message + header - @return Total size of allocated buffer. - */ -static unsigned long -msg_create(void *payload, ssize_t len, void **msg) -{ - unsigned long ret; - struct msg_struct msg_hdr; - - memset(&msg_hdr, 0, sizeof (msg_hdr)); - msg_hdr.ms_count = len; - msg_hdr.ms_crc32 = clu_crc32(payload, len); -#if __BYTE_ORDER == __BIG_ENDIAN - msg_hdr.ms_count = bswap_32(msg_hdr.ms_count); - msg_hdr.ms_crc32 = bswap_32(msg_hdr.ms_crc32); -#endif - - if (!len || !payload) - return sizeof (msg_hdr); - - *msg = (void *) malloc(sizeof (msg_hdr) + len); - if (*msg == NULL) { - errno = ENOMEM; - return -1; - } - memcpy(*msg, &msg_hdr, sizeof (msg_hdr)); - memcpy(*msg + sizeof (msg_hdr), payload, len); - - ret = sizeof (msg_hdr) + len; - return ret; -} - - -/** - Free a message buffer. - - @param msg Buffer to free. - */ -static inline void -msg_destroy(void *msg) -{ - if (msg != NULL) - free(msg); -} - /** Update our internal membership list with the provided list. Does NOT copy over resolved addresses; the caller may want to @@ -177,11 +119,6 @@ _msg_receive(int fd, void *buf, ssize_t count, struct timeval *tv) { - uint32_t crc; - int err; - struct msg_struct msg_hdr; - ssize_t retval = 0; - if (fd < 0) { errno = EBADF; return -1; @@ -197,36 +134,7 @@ return -1; } - if ((retval = _read_retry(fd, &msg_hdr, sizeof (msg_hdr), tv)) < - (ssize_t) sizeof (msg_hdr)) { - return -1; - } - -#if __BYTE_ORDER == __BIG_ENDIAN - msg_hdr.ms_count = bswap_32(msg_hdr.ms_count); - msg_hdr.ms_crc32 = bswap_32(msg_hdr.ms_crc32); -#endif - - if (!msg_hdr.ms_count) - return 0; - - err = errno; - retval = _read_retry(fd, buf, count, tv); - - if ((count == msg_hdr.ms_count) && (retval == count)) { - crc = clu_crc32(buf, retval); - - if (crc != msg_hdr.ms_crc32) { - /* Mangled message */ - fprintf(stderr, "CRC32 mismatch: 0x%08x vs. 0x%08x\n", - crc, msg_hdr.ms_crc32); - err = EIO; - retval = -1; - } - } - - errno = err; - return retval; + return _read_retry(fd, buf, count, tv); } @@ -234,7 +142,7 @@ Receive a message from a file descriptor w/o a timeout value. @param fd File descriptor to receive from - @param buf Pre-allocated bufffer \ + @param buf Pre-allocated bufffer @param count Size of expected message; must be <= size of preallocated buffer. @return -1 on failure or size of read data @@ -282,9 +190,6 @@ ssize_t msg_send(int fd, void *buf, ssize_t count) { - void *msg; - int msg_len = -1, bytes_written = 0; - if (fd == -1) { errno = EBADF; return -1; @@ -300,13 +205,7 @@ return -1; } - msg_len = msg_create(buf, count, &msg); - if ((bytes_written = write(fd, msg, msg_len)) < msg_len) { - msg_destroy(msg); - return -1; - } - msg_destroy(msg); - return (bytes_written - sizeof (struct msg_struct)); + return write(fd, buf, count); } @@ -914,50 +813,11 @@ ssize_t -_msg_peek(int sockfd, void *buf, ssize_t count) -{ - char *bigbuf; - ssize_t ret; - int bigbuf_sz; - int hdrsz = sizeof (struct msg_struct); - - bigbuf_sz = count + hdrsz; - bigbuf = (char *) malloc(bigbuf_sz); - if (bigbuf == NULL) - return -1; - - /* - * We need to account for the msg header. So we skip past it - * and decrement the return value by the number of bytes eaten - * up by the header. - */ - ret = recv(sockfd, bigbuf, bigbuf_sz, MSG_PEEK); - if (ret < 0) { - ret = errno; - free(bigbuf); - errno = ret; - return -1; - } - if (ret - hdrsz > 0) { - ret -= hdrsz; - if (ret > count) - ret = count; - memcpy(buf, bigbuf + hdrsz, ret); - } else { - ret = 0; - } - free(bigbuf); - - return ret; -} - - -ssize_t msg_peek(int sockfd, void *buf, ssize_t count) { if (sockfd < 0 || count > MSG_MAX_SIZE) { return -1; } - return (_msg_peek(sockfd, buf, count)); + return recv(sockfd, buf, count, MSG_PEEK); } From laza at yu.net Mon Jul 26 17:08:35 2004 From: laza at yu.net (Lazar Obradovic) Date: Mon, 26 Jul 2004 19:08:35 +0200 Subject: [Linux-cluster] SNMP modules? Message-ID: <1090861715.13809.3.camel@laza.eunet.yu> Hello all, I'd like to develop my own fencing agents (for IBM BladeCenter and QLogic SANBox2 switches), but they will require SNMP bindings. Is that ok with general development philosophy, since I'd like to contribude them? net-snmp-5.x.x-based API? -- Lazar Obradovic, System Engineer ----- laza at YU.net YUnet International http://www.EUnet.yu Dubrovacka 35/III, 11000 Belgrade Tel: +381 11 3119901; Fax: +381 11 3119901 ----- This e-mail is confidential and intended only for the recipient. Unauthorized distribution, modification or disclosure of its contents is prohibited. If you have received this e-mail in error, please notify the sender by telephone +381 11 3119901. ----- From canseco at fidmail.com Mon Jul 26 17:41:24 2004 From: canseco at fidmail.com (Robert) Date: Mon, 26 Jul 2004 12:41:24 -0500 Subject: [Linux-cluster] GFS: FS Mounting Issues Message-ID: <000001c47337$c5c5da10$0b50e5d8@roadkill> All, I have a question regarding the latest implementation of GFS 6.0 with RedHat Linux 3.0 Enterprise. What my company has going on is this: We have a SAN project coming up but we do not have the SAN or a similar type shared storage device available. We have the node machines on hand and are trying to work through the GFS implementation as we are new to GFS (We have run RedHat Linux since version 5.0 and all the flavors in between.). We have tried to simulate the SAN environment by utilizing the GNBD software and we have followed the instructions available at: http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-ex-slm-ext-gnbd.html This is the LOCK_GULM, SLM External, and GNBD example of GFS. We have not had any problems getting the shared devices, pools, and filesystems created as followed in the documentation. What we have happening is that when we mount the GFS filesystem on one node and then we try and mount the filesystem on the second node, the second node will hang when issued the command to mount the filesystem. No errors are reported on console or in logs. No errors are reported in the Lock Server either. Everything appears to be working correctly as the log information for both machines at that instant are the same, ie, the one that is hung, has the same log messages as the one that is not hung. When I go to node one, with node two still trying to mount the filesystem, and unmount the filesystem, node two immediately finishes the mount command and everything is fine with node two. However, when trying to mount node one again, it just hangs and so on. It is only allowing one node to mount the filesystem at once. The configuration files are all the generic examples given in the documentation with the fencing mechanism as GNBD (Tried fencing with manual and the same situation exists with that method.). I can provide all configuration files and also give log file information if this problem isn't something that experienced GFS users know what the problem may be. Thank you all for your time. Robert Fidelity Communications ps. I'm not sure if my first message got through as I sent it via an alternate email address so if this is a duplicate, please ignore. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mtilstra at redhat.com Mon Jul 26 18:05:44 2004 From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra) Date: Mon, 26 Jul 2004 13:05:44 -0500 Subject: [Linux-cluster] GFS: FS Mounting Issues In-Reply-To: <000001c47337$c5c5da10$0b50e5d8@roadkill> References: <000001c47337$c5c5da10$0b50e5d8@roadkill> Message-ID: <20040726180544.GA11937@redhat.com> On Mon, Jul 26, 2004 at 12:41:24PM -0500, Robert wrote: > We have not had any problems getting the shared devices, pools, and > filesystems created as followed in the documentation. What we have > happening is that when we mount the GFS filesystem on one node and > then we try and mount the filesystem on the second node, the second > node will hang when issued the command to mount the filesystem. No > errors are reported on console or in logs. No errors are reported in > the Lock Server either. Everything appears to be working correctly as > the log information for both machines at that instant are the same, > ie, the one that is hung, has the same log messages as the one that is > not hung. How long are the names of your nodes? There is a name length issue in the 6.0 code where the first 8 bytes of each node in you cluster needs to be unique. See bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=127828 -- Michael Conrad Tadpol Tilstra I always wanted to be a procrastinator, never got around to it. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From hanafim at asc.hpc.mil Mon Jul 26 18:20:28 2004 From: hanafim at asc.hpc.mil (MAHMOUD HANAFI) Date: Mon, 26 Jul 2004 14:20:28 -0400 Subject: [Linux-cluster] GFS rpms Message-ID: <41054B6C.2090509@asc.hpc.mil> We are currently running GFS5.0 with full support. Where do we get updates because i haven't been able to get any help from redhat. From canseco at fidmail.com Mon Jul 26 18:16:48 2004 From: canseco at fidmail.com (Robert) Date: Mon, 26 Jul 2004 13:16:48 -0500 Subject: [Linux-cluster] GFS: FS Mounting Issues In-Reply-To: <20040726180544.GA11937@redhat.com> Message-ID: <001001c4733c$b763f5b0$0b50e5d8@roadkill> Our node names are in the form: pe2650-ox.fidnet.com -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Michael Conrad Tadpol Tilstra Sent: Monday, July 26, 2004 1:06 PM To: Discussion of clustering software components including GFS Subject: Re: [Linux-cluster] GFS: FS Mounting Issues On Mon, Jul 26, 2004 at 12:41:24PM -0500, Robert wrote: > We have not had any problems getting the shared devices, pools, and > filesystems created as followed in the documentation. What we have > happening is that when we mount the GFS filesystem on one node and > then we try and mount the filesystem on the second node, the second > node will hang when issued the command to mount the filesystem. No > errors are reported on console or in logs. No errors are reported in > the Lock Server either. Everything appears to be working correctly as > the log information for both machines at that instant are the same, > ie, the one that is hung, has the same log messages as the one that is > not hung. How long are the names of your nodes? There is a name length issue in the 6.0 code where the first 8 bytes of each node in you cluster needs to be unique. See bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=127828 -- Michael Conrad Tadpol Tilstra I always wanted to be a procrastinator, never got around to it. From Rory_Savage.consultant at peoplesoft.com Mon Jul 26 18:52:10 2004 From: Rory_Savage.consultant at peoplesoft.com (Rory_Savage.consultant at peoplesoft.com) Date: Mon, 26 Jul 2004 14:52:10 -0400 Subject: [Linux-cluster] Trying to get GNBD with GFS Working Message-ID: Please Help! I have a two node cluster (hal-n1, and hal-n2). I exported the /dev/hda4 filesystem from hal-n2 [root at hal-n2 cluster]# gnbd_export -c -v -e export1 -d /dev/hda4 gnbd_export: created GNBD export1 serving file /dev/hda4 log file: Jul 26 14:22:35 hal-n2 gnbd_serv[3853]: gnbd device 'export1' serving /dev/hda4 exported with 130897620 sectors While trying to import the device on hal-n1, I am reciving the following error: [root at hal-n1 src]# gnbd_import -v -i hal-n2 gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such file or directory * My first reaction is, when did the "/sys" ever need to be in existance? I examined all of the build options for GNBD and could not find a prefrecnce location setting for anything related to "/sys". And I know this directory is not native to Red Hat (that I know of). System Configuration and Parameters Kernel 2.6.7 from source Kernel Config Options: CONFIG_MD=y CONFIG_BLK_DEV_MD=m CONFIG_MD_LINEAR=m CONFIG_MD_RAID0=m CONFIG_MD_RAID1=m CONFIG_MD_RAID5=m CONFIG_MD_RAID6=m CONFIG_MD_MULTIPATH=m CONFIG_BLK_DEV_DM=m CONFIG_DM_CRYPT=m CONFIG_BLK_DEV_GNBD=m CONFIG_CLUSTER=m CONFIG_CLUSTER_DLM=m CONFIG_CLUSTER_DLM_PROCLOCKS=y CONFIG_LOCK_HARNESS=m CONFIG_GFS_FS=m CONFIG_LOCK_NOLOCK=m CONFIG_LOCK_DLM=m CONFIG_LOCK_GULM=m # GFS and GNBD sources obtain via CVS [root at hal-n1 src]# cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M hal-n1 2 1 1 M hal-n2 [root at hal-n1 src]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 join S-6,20,1 [1] DLM Lock Space: "clvmd" 2 3 run - [1 2] [root at hal-n1 src]# cat /proc/cluster/status Version: 2.0.1 Config version: 1 Cluster name: xcluster Cluster ID: 28724 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 3 Node addresses: 10.1.1.1 [root at hal-n2 cluster]# cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M hal-n1 2 1 1 M hal-n2 [root at hal-n2 cluster]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 0 2 join S-1,80,2 [] DLM Lock Space: "clvmd" 2 3 run - [1 2] [root at hal-n2 cluster]# cat /proc/cluster/status Version: 2.0.1 Config version: 1 Cluster name: xcluster Cluster ID: 28724 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 3 Node addresses: 10.1.1.2 -- Rory Savage, Charlotte DSI Group Product & Technology PeopleSoft Inc. 14045 Ballantyne Corporate Place Suite 101 Charlotte, NC 28277 Email: rory_savage at peoplesoft.com Phone: 704.401.1104 Fax: 704.401.1240 From Rory_Savage.consultant at peoplesoft.com Mon Jul 26 19:13:35 2004 From: Rory_Savage.consultant at peoplesoft.com (Rory_Savage.consultant at peoplesoft.com) Date: Mon, 26 Jul 2004 15:13:35 -0400 Subject: [Linux-cluster] Trying to get GNBD with GFS Working Message-ID: Please Help! I have a two node cluster (hal-n1, and hal-n2). I exported the /dev/hda4 filesystem from hal-n2 [root at hal-n2 cluster]# gnbd_export -c -v -e export1 -d /dev/hda4 gnbd_export: created GNBD export1 serving file /dev/hda4 log file: Jul 26 14:22:35 hal-n2 gnbd_serv[3853]: gnbd device 'export1' serving /dev/hda4 exported with 130897620 sectors While trying to import the device on hal-n1, I am reciving the following error: [root at hal-n1 src]# gnbd_import -v -i hal-n2 gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such file or directory * My first reaction is, when did the "/sys" ever need to be in existance? I examined all of the build options for GNBD and could not find a prefrecnce location setting for anything related to "/sys". And I know this directory is not native to Red Hat (that I know of). System Configuration and Parameters Kernel 2.6.7 from source Kernel Config Options: CONFIG_MD=y CONFIG_BLK_DEV_MD=m CONFIG_MD_LINEAR=m CONFIG_MD_RAID0=m CONFIG_MD_RAID1=m CONFIG_MD_RAID5=m CONFIG_MD_RAID6=m CONFIG_MD_MULTIPATH=m CONFIG_BLK_DEV_DM=m CONFIG_DM_CRYPT=m CONFIG_BLK_DEV_GNBD=m CONFIG_CLUSTER=m CONFIG_CLUSTER_DLM=m CONFIG_CLUSTER_DLM_PROCLOCKS=y CONFIG_LOCK_HARNESS=m CONFIG_GFS_FS=m CONFIG_LOCK_NOLOCK=m CONFIG_LOCK_DLM=m CONFIG_LOCK_GULM=m # GFS and GNBD sources obtain via CVS [root at hal-n1 src]# cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M hal-n1 2 1 1 M hal-n2 [root at hal-n1 src]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 join S-6,20,1 [1] DLM Lock Space: "clvmd" 2 3 run - [1 2] [root at hal-n1 src]# cat /proc/cluster/status Version: 2.0.1 Config version: 1 Cluster name: xcluster Cluster ID: 28724 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 3 Node addresses: 10.1.1.1 [root at hal-n2 cluster]# cat /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M hal-n1 2 1 1 M hal-n2 [root at hal-n2 cluster]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 0 2 join S-1,80,2 [] DLM Lock Space: "clvmd" 2 3 run - [1 2] [root at hal-n2 cluster]# cat /proc/cluster/status Version: 2.0.1 Config version: 1 Cluster name: xcluster Cluster ID: 28724 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 3 Node addresses: 10.1.1.2 -- Rory Savage, Charlotte DSI Group Product & Technology PeopleSoft Inc. 14045 Ballantyne Corporate Place Suite 101 Charlotte, NC 28277 Email: rory_savage at peoplesoft.com Phone: 704.401.1104 Fax: 704.401.1240 From gshi at ncsa.uiuc.edu Mon Jul 26 19:32:19 2004 From: gshi at ncsa.uiuc.edu (Guochun Shi) Date: Mon, 26 Jul 2004 14:32:19 -0500 Subject: [Linux-cluster] GFS/GNBD configuration Message-ID: <5.1.0.14.2.20040726142340.04937de0@pop.ncsa.uiuc.edu> hi, is the configuration in the attached file for gnbd and gfs feasible? Thanks -Guochun -------------- next part -------------- A non-text attachment was scrubbed... Name: config.pdf Type: application/pdf Size: 4879 bytes Desc: not available URL: From bmarzins at redhat.com Mon Jul 26 20:27:56 2004 From: bmarzins at redhat.com (Benjamin Marzinski) Date: Mon, 26 Jul 2004 15:27:56 -0500 Subject: [Linux-cluster] Trying to get GNBD with GFS Working In-Reply-To: References: Message-ID: <20040726202756.GK23619@phlogiston.msp.redhat.com> On Mon, Jul 26, 2004 at 03:13:35PM -0400, Rory_Savage.consultant at peoplesoft.com wrote: > > > > > Please Help! > > I have a two node cluster (hal-n1, and hal-n2). I exported the /dev/hda4 > filesystem from hal-n2 > > [root at hal-n2 cluster]# gnbd_export -c -v -e export1 -d /dev/hda4 > gnbd_export: created GNBD export1 serving file /dev/hda4 > > log file: > > Jul 26 14:22:35 hal-n2 gnbd_serv[3853]: gnbd device 'export1' serving > /dev/hda4 exported with 130897620 sectors > > While trying to import the device on hal-n1, I am reciving the following > error: > > [root at hal-n1 src]# gnbd_import -v -i hal-n2 > gnbd_import: ERROR cannot get /sys/class/gnbd/gnbd0/name value : No such > file or directory GNBD requires sysfs to run. Somewhere in you kernel config file, you should have: CONFIG_SYSFS=y Then run the command: # mount -t sysfs sysfs /sys to mount sysfs. For more information on sysfs, see Documentation/filesystems/sysfs.txt Hope this helps -Ben bmarzins at redhat.com > * My first reaction is, when did the "/sys" ever need to be in existance? > I examined all of the build options for GNBD and could not find a > prefrecnce location setting for anything related to "/sys". And I know > this directory is not native to Red Hat (that I know of). > > System Configuration and Parameters > > Kernel 2.6.7 from source > > Kernel Config Options: > > CONFIG_MD=y > CONFIG_BLK_DEV_MD=m > CONFIG_MD_LINEAR=m > CONFIG_MD_RAID0=m > CONFIG_MD_RAID1=m > CONFIG_MD_RAID5=m > CONFIG_MD_RAID6=m > CONFIG_MD_MULTIPATH=m > CONFIG_BLK_DEV_DM=m > CONFIG_DM_CRYPT=m > CONFIG_BLK_DEV_GNBD=m > > CONFIG_CLUSTER=m > CONFIG_CLUSTER_DLM=m > CONFIG_CLUSTER_DLM_PROCLOCKS=y > > CONFIG_LOCK_HARNESS=m > CONFIG_GFS_FS=m > CONFIG_LOCK_NOLOCK=m > CONFIG_LOCK_DLM=m > CONFIG_LOCK_GULM=m > > # GFS and GNBD sources obtain via CVS > > [root at hal-n1 src]# cat /proc/cluster/nodes > Node Votes Exp Sts Name > 1 1 1 M hal-n1 > 2 1 1 M hal-n2 > > [root at hal-n1 src]# cat /proc/cluster/services > > Service Name GID LID State Code > Fence Domain: "default" 1 2 join > S-6,20,1 > [1] > > DLM Lock Space: "clvmd" 2 3 run - > [1 2] > > [root at hal-n1 src]# cat /proc/cluster/status > Version: 2.0.1 > Config version: 1 > Cluster name: xcluster > Cluster ID: 28724 > Membership state: Cluster-Member > Nodes: 2 > Expected_votes: 1 > Total_votes: 2 > Quorum: 1 > Active subsystems: 3 > Node addresses: 10.1.1.1 > > [root at hal-n2 cluster]# cat /proc/cluster/nodes > Node Votes Exp Sts Name > 1 1 1 M hal-n1 > 2 1 1 M hal-n2 > > [root at hal-n2 cluster]# cat /proc/cluster/services > > Service Name GID LID State Code > Fence Domain: "default" 0 2 join > S-1,80,2 > [] > > DLM Lock Space: "clvmd" 2 3 run - > [1 2] > > [root at hal-n2 cluster]# cat /proc/cluster/status > Version: 2.0.1 > Config version: 1 > Cluster name: xcluster > Cluster ID: 28724 > Membership state: Cluster-Member > Nodes: 2 > Expected_votes: 1 > Total_votes: 2 > Quorum: 1 > Active subsystems: 3 > Node addresses: 10.1.1.2 > > > > > -- > Rory Savage, Charlotte DSI Group > Product & Technology > PeopleSoft Inc. > 14045 Ballantyne Corporate Place > Suite 101 > Charlotte, NC 28277 > Email: rory_savage at peoplesoft.com > Phone: 704.401.1104 > Fax: 704.401.1240 > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From mailing-lists at hughesjr.com Mon Jul 26 21:08:09 2004 From: mailing-lists at hughesjr.com (Johnny Hughes) Date: Mon, 26 Jul 2004 16:08:09 -0500 Subject: [Linux-cluster] GFS rpms In-Reply-To: <41054B6C.2090509@asc.hpc.mil> References: <41054B6C.2090509@asc.hpc.mil> Message-ID: <1090876088.18047.5.camel@Myth.home.local> On Mon, 2004-07-26 at 13:20, MAHMOUD HANAFI wrote: > We are currently running GFS5.0 with full support. Where do we get > updates because i haven't been able to get any help from redhat. > You can download rpms from my website that run on RHEL 3 and WhiteBox EL 3 for AMD and i686 (smp, hugemem, regular) for the 2.4.21-15.0.3.EL kernel. RHEL / WBEL GFS That is not the official download. Johnny Hughes HughesJR.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From linux-cluster-rhn at chaj.com Mon Jul 26 21:17:53 2004 From: linux-cluster-rhn at chaj.com (linux-cluster-rhn at chaj.com) Date: Mon, 26 Jul 2004 17:17:53 -0400 (EDT) Subject: [Linux-cluster] requirements question In-Reply-To: <001001c4733c$b763f5b0$0b50e5d8@roadkill> References: <001001c4733c$b763f5b0$0b50e5d8@roadkill> Message-ID: We've got a fiber channel connection from two nodes to a single san storage share (through a brocade). We're looking to export nfs from one of the hosts with the other host as failover. As a test, I patched a 2.6.7 kernel and compiled the necessary utilities according to http://gfs.wikidev.net/Installation. I used LVM2 to create a logical volume on the san device (/dev/sda1), gfs_mkfs'd the device according to the clustering config that I made, and successfully mounted the lv on both hosts as a local drive. Is it necessary for us to use LVM if the san is already doing the raid/redundancy? What is the bare minimum in terms of daemons that we'd need in order to run the above setup? I'm thinking it'd be something like: ccsd cman_tool join clvmd mount -t gfs /dev/sda1 /mnt/san-name Also, what suggestions do you have for an automatic failover system for the two hosts? I imagine some sort of heartbeat package. Thanks for your time. Jim From hanafim at asc.hpc.mil Mon Jul 26 21:37:48 2004 From: hanafim at asc.hpc.mil (MAHMOUD HANAFI) Date: Mon, 26 Jul 2004 17:37:48 -0400 Subject: [Linux-cluster] GFS rpms In-Reply-To: <1090876088.18047.5.camel@Myth.home.local> References: <41054B6C.2090509@asc.hpc.mil> <1090876088.18047.5.camel@Myth.home.local> Message-ID: <410579AC.7000106@asc.hpc.mil> Thanks! Johnny Hughes wrote: > On Mon, 2004-07-26 at 13:20, MAHMOUD HANAFI wrote: > >>/We are currently running GFS5.0 with full support. Where do we get >>updates because i haven't been able to get any help from redhat. >>/ >> > > You can download rpms from my website that run on RHEL 3 and WhiteBox EL > 3 for AMD and i686 (smp, hugemem, regular) for the 2.4.21-15.0.3.EL kernel. > > RHEL / WBEL GFS > > > That is not the official download. > > Johnny Hughes > _HughesJR.com_ > > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From notiggy at gmail.com Mon Jul 26 23:47:04 2004 From: notiggy at gmail.com (Brian Jackson) Date: Mon, 26 Jul 2004 18:47:04 -0500 Subject: [Linux-cluster] GFS/GNBD configuration In-Reply-To: <5.1.0.14.2.20040726142340.04937de0@pop.ncsa.uiuc.edu> References: <5.1.0.14.2.20040726142340.04937de0@pop.ncsa.uiuc.edu> Message-ID: In fact that's what I think gnbd was created for. Although it's gained more popularity as a way to export local drives from a computer. --Brian Jackson On Mon, 26 Jul 2004 14:32:19 -0500, Guochun Shi wrote: > hi, > > is the configuration in the attached file for gnbd and gfs feasible? > > Thanks > -Guochun > > From notiggy at gmail.com Tue Jul 27 00:23:57 2004 From: notiggy at gmail.com (Brian Jackson) Date: Mon, 26 Jul 2004 19:23:57 -0500 Subject: [Linux-cluster] requirements question In-Reply-To: References: <001001c4733c$b763f5b0$0b50e5d8@roadkill> Message-ID: On Mon, 26 Jul 2004 17:17:53 -0400 (EDT), linux-cluster-rhn at chaj.com wrote: > > We've got a fiber channel connection from two nodes to a single san storage > share (through a brocade). We're looking to export nfs from one of the hosts > with the other host as failover. As a test, I patched a 2.6.7 kernel and > compiled the necessary utilities according to > http://gfs.wikidev.net/Installation. I used LVM2 to create a logical volume on > the san device (/dev/sda1), gfs_mkfs'd the device according to the clustering > config that I made, and successfully mounted the lv on both hosts as a local > drive. Is it necessary for us to use LVM if the san is already doing the > raid/redundancy? Nope, in your situation it would be most useful providing stable device naming. > What is the bare minimum in terms of daemons that we'd need > in order to run the above setup? I'm thinking it'd be something like: > > ccsd > cman_tool join > clvmd > mount -t gfs /dev/sda1 /mnt/san-name looks right > > Also, what suggestions do you have for an automatic failover system for the > two hosts? I imagine some sort of heartbeat package. Thanks for your time. Currently there is heartbeat (linux-ha.org), and a few others. I believe redhat is working on one as well that will fit in with their infrastructure bits --Brian Jackson > > Jim From teigland at redhat.com Tue Jul 27 02:47:18 2004 From: teigland at redhat.com (David Teigland) Date: Tue, 27 Jul 2004 10:47:18 +0800 Subject: [Linux-cluster] SNMP modules? In-Reply-To: <1090861715.13809.3.camel@laza.eunet.yu> References: <1090861715.13809.3.camel@laza.eunet.yu> Message-ID: <20040727024718.GC12983@redhat.com> On Mon, Jul 26, 2004 at 07:08:35PM +0200, Lazar Obradovic wrote: > Hello all, > > I'd like to develop my own fencing agents (for IBM BladeCenter and > QLogic SANBox2 switches), but they will require SNMP bindings. > > Is that ok with general development philosophy, since I'd like to > contribude them? net-snmp-5.x.x-based API? That sounds great, we'd be happy to add them to the collection. -- Dave Teigland From teigland at redhat.com Tue Jul 27 03:01:04 2004 From: teigland at redhat.com (David Teigland) Date: Tue, 27 Jul 2004 11:01:04 +0800 Subject: [Linux-cluster] requirements question In-Reply-To: References: <001001c4733c$b763f5b0$0b50e5d8@roadkill> Message-ID: <20040727030104.GD12983@redhat.com> On Mon, Jul 26, 2004 at 05:17:53PM -0400, linux-cluster-rhn at chaj.com wrote: > > We've got a fiber channel connection from two nodes to a single san storage > share (through a brocade). We're looking to export nfs from one of the hosts > with the other host as failover. As a test, I patched a 2.6.7 kernel and > compiled the necessary utilities according to > http://gfs.wikidev.net/Installation. I used LVM2 to create a logical volume on > the san device (/dev/sda1), gfs_mkfs'd the device according to the clustering > config that I made, and successfully mounted the lv on both hosts as a local > drive. Is it necessary for us to use LVM if the san is already doing the > raid/redundancy? What is the bare minimum in terms of daemons that we'd need > in order to run the above setup? I'm thinking it'd be something like: > > ccsd > cman_tool join > clvmd > mount -t gfs /dev/sda1 /mnt/san-name Without CLVM the steps reduce to: ccsd cman_tool join fence_tool join mount -t gfs /dev/sda1 /mnt > Also, what suggestions do you have for an automatic failover system for the > two hosts? I imagine some sort of heartbeat package. Thanks for your time. The "Resource Manager" will do NFS failover: https://www.redhat.com/archives/linux-cluster/2004-July/msg00121.html -- Dave Teigland From jeff at intersystems.com Tue Jul 27 12:28:20 2004 From: jeff at intersystems.com (Jeff) Date: Tue, 27 Jul 2004 08:28:20 -0400 Subject: [Linux-cluster] EDEADLOCK status in dlm In-Reply-To: <20040727024718.GC12983@redhat.com> References: <1090861715.13809.3.camel@laza.eunet.yu> <20040727024718.GC12983@redhat.com> Message-ID: <417837901.20040727082820@intersystems.com> The dlm document describes a return status of EDEADLOCK and this is referenced in ast.c and a couple of the tests. Using the latest version of CVS (I'm pretty sure) I can't find the definition for EDEADLOCK in a header file. The only definition is in one of the tests (which doesn't build) and it defines it as SS$_DEADLOCK :-) [Neither of the tests in the dlm\tests\locktest directory compile cleanly] I assume I'm missing something but I'm not sure what it is. From pcaulfie at redhat.com Tue Jul 27 13:09:19 2004 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 27 Jul 2004 14:09:19 +0100 Subject: [Linux-cluster] EDEADLOCK status in dlm In-Reply-To: <417837901.20040727082820@intersystems.com> References: <1090861715.13809.3.camel@laza.eunet.yu> <20040727024718.GC12983@redhat.com> <417837901.20040727082820@intersystems.com> Message-ID: <20040727130919.GH14648@tykepenguin.com> On Tue, Jul 27, 2004 at 08:28:20AM -0400, Jeff wrote: > The dlm document describes a return status of EDEADLOCK > and this is referenced in ast.c and a couple of the tests. > > Using the latest version of CVS (I'm pretty sure) I can't find > the definition for EDEADLOCK in a header file. The only > definition is in one of the tests (which doesn't build) and it > defines it as SS$_DEADLOCK :-) EDEADLOCK should be in /usr/include/errno.h (actually I think its asm/errno.h) so should not need to be defined by the dlm headers. > [Neither of the tests in the dlm\tests\locktest directory > compile cleanly] That's quite probable, those are kernel modules written some time ago. If you can be bothered to manually hook them into the kernel build system I think locktest.c should work. I might get some time to fix the makefiles and old bits fixed in those but it's not a priority. I suspect pingtest only works on VMS by now :-) -- patrick From lhh at redhat.com Tue Jul 27 13:21:27 2004 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 27 Jul 2004 09:21:27 -0400 Subject: [Linux-cluster] requirements question In-Reply-To: <20040727030104.GD12983@redhat.com> References: <001001c4733c$b763f5b0$0b50e5d8@roadkill> <20040727030104.GD12983@redhat.com> Message-ID: <1090934487.8748.75.camel@dhcp83-21.boston.redhat.com> On Tue, 2004-07-27 at 11:01 +0800, David Teigland wrote: > > The "Resource Manager" will do NFS failover: > https://www.redhat.com/archives/linux-cluster/2004-July/msg00121.html > True, dat. It's still got a few kinks though. -- Lon From michael.krietemeyer at informatik.uni-rostock.de Mon Jul 26 12:06:01 2004 From: michael.krietemeyer at informatik.uni-rostock.de (Michael Krietemeyer) Date: Mon, 26 Jul 2004 14:06:01 +0200 Subject: [Linux-cluster] GFS mount problem Message-ID: <4104F3A9.8080303@informatik.uni-rostock.de> Hello We have setuped our small 4-node cluster with the RedHat 2.4.21-15.0.3ELsmp Kernel and use the GFS 6.0.0-7 Package. One cluster-node exports via gnbd two disks to the three other nodes. One of these exports is used as CCA-Device ond one for a GFS share. Now we setup GFS like the example C.1. in the "Red Hat GFS 6.0 Administrator's Guide" (three nodes, each as LOCK_GULM Server and GFS Client, fence method: manual). All steps work fine, except the mount. On the fist node, the mount works. The mount on the second node blocks, until the node one unmounts the gfs share. (Summary: Only one node can mount the share at the same time). Can somebody help? Michael Krietemeyer From robert at dicus.org Mon Jul 26 15:47:26 2004 From: robert at dicus.org (Robert) Date: Mon, 26 Jul 2004 10:47:26 -0500 Subject: [Linux-cluster] GFS: FS Mount Issues Message-ID: <4105278E.7020804@dicus.org> An HTML attachment was scrubbed... URL: From Carl.Bavington at ca.com Wed Jul 28 08:31:22 2004 From: Carl.Bavington at ca.com (Bavington, Carl) Date: Wed, 28 Jul 2004 09:31:22 +0100 Subject: [Linux-cluster] GFS: FS Mount Issues Message-ID: <08237065FA027340B731E570990978540252B184@ukslms22.ca.com> Robert, I am also seeing the hang on a second mount, No errors reported in logs. Did you get an answer?. Thanks, Carl Bavington mob +44 (0)7793 758327 _____ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Sent: 26 July 2004 16:47 To: linux-cluster at redhat.com Subject: [Linux-cluster] GFS: FS Mount Issues All, I have a question regarding the latest implementation of GFS 6.0 with RedHat Linux 3.0 Enterprise. What my company has going on is this: We have a SAN project coming up but we do not have the SAN or a similar type shared storage device available. We have the node machines on hand and are trying to work through the GFS implementation as we are new to GFS (We have run RedHat Linux since version 5.0 and all the flavors in between.). We have tried to simulate the SAN environment by utilizing the GNBD software and we have followed the instructions available at: http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-ex-slm-ext-gnbd. html This is the LOCK_GULM, SLM External, and GNBD example of GFS. We have not had any problems getting the shared devices, pools, and filesystems created as followed in the documentation. What we have happening is that when we mount the GFS filesystem on one node and then we try and mount the filesystem on the second node, the second node will hang when issued the command to mount the filesystem. No errors are reported on console or in logs. No errors are reported in the Lock Server either. Everything appears to be working correctly as the log information for both machines at that instant are the same, ie, the one that is hung, has the same log messages as the one that is not hung. When I go to node one, with node two still trying to mount the filesystem, and unmount the filesystem, node two immediately finishes the mount command and everything is fine with node two. However, when trying to mount node one again, it just hangs and so on. It is only allowing one node to mount the filesystem at once. The configuration files are all the generic examples given in the documentation with the fencing mechanism as GNBD (Tried fencing with manual and the same situation exists with that method.). I can provide all configuration files and also give log file information if this problem isn't something that experienced GFS users know what the problem may be. Thank you all for your time. Robert Fidelity Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen.willey at framestore-cfc.com Wed Jul 28 08:47:33 2004 From: stephen.willey at framestore-cfc.com (Stephen Willey) Date: Wed, 28 Jul 2004 09:47:33 +0100 Subject: [Linux-cluster] GFS: FS Mount Issues In-Reply-To: <08237065FA027340B731E570990978540252B184@ukslms22.ca.com> References: <08237065FA027340B731E570990978540252B184@ukslms22.ca.com> Message-ID: <41076825.2040302@framestore-cfc.com> Have you mkfs'd the filesystem with enough journals for each machine? If you only created one journal I guess it'd do this... Stephen Bavington, Carl wrote: > Robert, > > I am also seeing the hang on a second mount, No errors reported in > logs. Did you get an answer?. > > Thanks, > > Carl Bavington > > mob +44 (0)7793 758327 > > ------------------------------------------------------------------------ > > *From:* linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Robert > *Sent:* 26 July 2004 16:47 > *To:* linux-cluster at redhat.com > *Subject:* [Linux-cluster] GFS: FS Mount Issues > > All, > > I have a question regarding the latest implementation of GFS 6.0 with > RedHat Linux 3.0 Enterprise. What my company has going on is this: We > have a SAN project coming up but we do not have the SAN or a similar > type shared storage device available. We have the node machines on > hand and are trying to work through the GFS implementation as we are > new to GFS (We have run RedHat Linux since version 5.0 and all the > flavors in between.). We have tried to simulate the SAN environment by > utilizing the GNBD software and we have followed the instructions > available at: > http://www.redhat.com/docs/manuals/csgfs/admin-guide/s1-ex-slm-ext-gnbd.html > > This is the LOCK_GULM, SLM External, and GNBD example of GFS. > > We have not had any problems getting the shared devices, pools, and > filesystems created as followed in the documentation. What we have > happening is that when we mount the GFS filesystem on one node and > then we try and mount the filesystem on the second node, the second > node will hang when issued the command to mount the filesystem. No > errors are reported on console or in logs. No errors are reported in > the Lock Server either. Everything appears to be working correctly as > the log information for both machines at that instant are the same, > ie, the one that is hung, has the same log messages as the one that is > not hung. > > When I go to node one, with node two still trying to mount the > filesystem, and unmount the filesystem, node two immediately finishes > the mount command and everything is fine with node two. However, when > trying to mount node one again, it just hangs and so on. It is only > allowing one node to mount the filesystem at once. The configuration > files are all the generic examples given in the documentation with the > fencing mechanism as GNBD (Tried fencing with manual and the same > situation exists with that method.). I can provide all configuration > files and also give log file information if this problem isn?t > something that experienced GFS users know what the problem may be. > > Thank you all for your time. > > Robert > > Fidelity Communications > >------------------------------------------------------------------------ > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >http://www.redhat.com/mailman/listinfo/linux-cluster > > From laza at yu.net Wed Jul 28 08:57:11 2004 From: laza at yu.net (Lazar Obradovic) Date: Wed, 28 Jul 2004 10:57:11 +0200 Subject: [Linux-cluster] Posix ACL deps for gfs-kernel modules Message-ID: <1091005031.29997.887.camel@laza.eunet.yu> Hi, I've been playing with out-of-tree building of gfs related modules (actually, creating ebuilds for gfs), and found out one really stupid thing: You cannot successfuly build gfs-kernel modules if you don't have at least one "regular" filesystem build in kernel with Posix ACL support. The thing is that gfs-kernel tree modules expect to have posix_acl* symbols available from already build kernel. Perhaps it's not a "bug" in gfs-kernel after all, since kernel documentation states: --- fs/KConfig --- config FS_POSIX_ACL # Posix ACL utility routines (for now, only ext2/ext3/jfs/reiserfs) # # NOTE: you can implement Posix ACLs without these helpers (XFS does). # Never use this symbol for ifdefs. # bool depends on EXT2_FS_POSIX_ACL || EXT3_FS_POSIX_ACL || JFS_POSIX_ACL || REISERFS_FS_POSIX_ACL default y --- fs/KConfig --- but, on the other hand, it doesn't put CONFIG_FS_POSIX_ACL in .config, so fs/Makefile ignores posix_acl.o and xattr.o when compiliing kernel. Is this gfs or kernel issue? Can we locally correct this (somehow force the compilation of fs/posix_acl.o and fs/xattr.o if not available) or do we have report this to LKML? Quick fix would be to compile kernel with EXT3_FS_POSIX_ACL, but i'm not sure what side-effects would that have on ext3 filesystems. -- Lazar Obradovic, System Engineer ----- laza at YU.net YUnet International http://www.EUnet.yu Dubrovacka 35/III, 11000 Belgrade Tel: +381 11 3119901; Fax: +381 11 3119901 ----- This e-mail is confidential and intended only for the recipient. Unauthorized distribution, modification or disclosure of its contents is prohibited. If you have received this e-mail in error, please notify the sender by telephone +381 11 3119901. ----- From mtilstra at redhat.com Wed Jul 28 14:36:35 2004 From: mtilstra at redhat.com (Michael Conrad Tadpol Tilstra) Date: Wed, 28 Jul 2004 09:36:35 -0500 Subject: [Linux-cluster] GFS: FS Mount Issues In-Reply-To: <41076825.2040302@framestore-cfc.com> References: <08237065FA027340B731E570990978540252B184@ukslms22.ca.com> <41076825.2040302@framestore-cfc.com> Message-ID: <20040728143635.GA6734@redhat.com> On Wed, Jul 28, 2004 at 09:47:33AM +0100, Stephen Willey wrote: > Have you mkfs'd the filesystem with enough journals for each machine? If > you only created one journal I guess it'd do this... It should not hang if there are not enough journals. The mount should fail, and a message stating why will be in dmsg. -- Michael Conrad Tadpol Tilstra It's not reality that's important, but how you perceive things. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From kpreslan at redhat.com Wed Jul 28 14:49:58 2004 From: kpreslan at redhat.com (Ken Preslan) Date: Wed, 28 Jul 2004 09:49:58 -0500 Subject: [Linux-cluster] Posix ACL deps for gfs-kernel modules In-Reply-To: <1091005031.29997.887.camel@laza.eunet.yu> References: <1091005031.29997.887.camel@laza.eunet.yu> Message-ID: <20040728144957.GA30927@potassium.msp.redhat.com> If you apply the patches in cluster/gfs-kernel/patches to the kernel and build GFS that way, things work ok. On Wed, Jul 28, 2004 at 10:57:11AM +0200, Lazar Obradovic wrote: > Hi, > > I've been playing with out-of-tree building of gfs related modules > (actually, creating ebuilds for gfs), and found out one really stupid > thing: > > You cannot successfuly build gfs-kernel modules if you don't have at > least one "regular" filesystem build in kernel with Posix ACL support. > The thing is that gfs-kernel tree modules expect to have posix_acl* > symbols available from already build kernel. > > Perhaps it's not a "bug" in gfs-kernel after all, since kernel > documentation states: > > --- fs/KConfig --- > config FS_POSIX_ACL > # Posix ACL utility routines (for now, only ext2/ext3/jfs/reiserfs) > # > # NOTE: you can implement Posix ACLs without these helpers (XFS does). > # Never use this symbol for ifdefs. > # > bool > depends on EXT2_FS_POSIX_ACL || EXT3_FS_POSIX_ACL || > JFS_POSIX_ACL || REISERFS_FS_POSIX_ACL > default y > --- fs/KConfig --- > > but, on the other hand, it doesn't put CONFIG_FS_POSIX_ACL in .config, > so fs/Makefile ignores posix_acl.o and xattr.o when compiliing kernel. > > Is this gfs or kernel issue? Can we locally correct this (somehow force > the compilation of fs/posix_acl.o and fs/xattr.o if not available) or do > we have report this to LKML? > > Quick fix would be to compile kernel with EXT3_FS_POSIX_ACL, but i'm not > sure what side-effects would that have on ext3 filesystems. > > -- > Lazar Obradovic, System Engineer > ----- > laza at YU.net > YUnet International http://www.EUnet.yu > Dubrovacka 35/III, 11000 Belgrade > Tel: +381 11 3119901; Fax: +381 11 3119901 > ----- > This e-mail is confidential and intended only for the recipient. > Unauthorized distribution, modification or disclosure of its > contents is prohibited. If you have received this e-mail in error, > please notify the sender by telephone +381 11 3119901. > ----- > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Ken Preslan From jeff at intersystems.com Wed Jul 28 15:15:17 2004 From: jeff at intersystems.com (Jeff) Date: Wed, 28 Jul 2004 11:15:17 -0400 Subject: [Linux-cluster] Specifyng the AST routine in dlm_unlock() Message-ID: <58786861.20040728111517@intersystems.com> dlm_unlock() is documented as being asynchronous and it takes an astarg as one of its arguments. However it does not take an AST routine as an argument. What routine gets executed when an unlock completes? From teigland at redhat.com Wed Jul 28 15:38:40 2004 From: teigland at redhat.com (David Teigland) Date: Wed, 28 Jul 2004 23:38:40 +0800 Subject: [Linux-cluster] Specifyng the AST routine in dlm_unlock() In-Reply-To: <58786861.20040728111517@intersystems.com> References: <58786861.20040728111517@intersystems.com> Message-ID: <20040728153840.GH13983@redhat.com> On Wed, Jul 28, 2004 at 11:15:17AM -0400, Jeff wrote: > dlm_unlock() is documented as being asynchronous and it > takes an astarg as one of its arguments. However it does > not take an AST routine as an argument. > > What routine gets executed when an unlock completes? The AST routine from dlm_lock() is saved and used for dlm_unlock(). -- Dave Teigland From jeff at intersystems.com Wed Jul 28 15:57:03 2004 From: jeff at intersystems.com (Jeff) Date: Wed, 28 Jul 2004 11:57:03 -0400 Subject: [Linux-cluster] Specifyng the AST routine in dlm_unlock() In-Reply-To: <20040728153840.GH13983@redhat.com> References: <58786861.20040728111517@intersystems.com> <20040728153840.GH13983@redhat.com> Message-ID: <1024309498.20040728115703@intersystems.com> Wednesday, July 28, 2004, 11:38:40 AM, David Teigland wrote: > On Wed, Jul 28, 2004 at 11:15:17AM -0400, Jeff wrote: >> dlm_unlock() is documented as being asynchronous and it >> takes an astarg as one of its arguments. However it does >> not take an AST routine as an argument. >> >> What routine gets executed when an unlock completes? > The AST routine from dlm_lock() is saved and used for dlm_unlock(). This makes it difficult to update an application which works with other DLM's as all the completion AST routines need to be updated to test for EUNLOCK to figure out why they've been invoked. Would it be possible to add an optional argument to dlm_unlock() for the AST routine to call when the unlock completes? If this is omitted, the existing completion AST routine is executed. From teigland at redhat.com Wed Jul 28 16:06:35 2004 From: teigland at redhat.com (David Teigland) Date: Thu, 29 Jul 2004 00:06:35 +0800 Subject: [Linux-cluster] Specifyng the AST routine in dlm_unlock() In-Reply-To: <1024309498.20040728115703@intersystems.com> References: <58786861.20040728111517@intersystems.com> <20040728153840.GH13983@redhat.com> <1024309498.20040728115703@intersystems.com> Message-ID: <20040728160635.GK13983@redhat.com> On Wed, Jul 28, 2004 at 11:57:03AM -0400, Jeff wrote: > Wednesday, July 28, 2004, 11:38:40 AM, David Teigland wrote: > > > On Wed, Jul 28, 2004 at 11:15:17AM -0400, Jeff wrote: > >> dlm_unlock() is documented as being asynchronous and it > >> takes an astarg as one of its arguments. However it does > >> not take an AST routine as an argument. > >> > >> What routine gets executed when an unlock completes? > > > The AST routine from dlm_lock() is saved and used for dlm_unlock(). > > This makes it difficult to update an application which works > with other DLM's as all the completion AST routines need to be > updated to test for EUNLOCK to figure out why they've been > invoked. > > Would it be possible to add an optional argument to dlm_unlock() > for the AST routine to call when the unlock completes? > If this is omitted, the existing completion AST routine is > executed. It should be simple to add an AST routine as an arg to dlm_unlock(). -- Dave Teigland From amanthei at redhat.com Wed Jul 28 16:46:12 2004 From: amanthei at redhat.com (Adam Manthei) Date: Wed, 28 Jul 2004 11:46:12 -0500 Subject: [Linux-cluster] GFS mount problem In-Reply-To: <4104F3A9.8080303@informatik.uni-rostock.de> References: <4104F3A9.8080303@informatik.uni-rostock.de> Message-ID: <20040728164612.GD27527@redhat.com> Has your problem been resolved yet? It sounds similar to a hostname length issue that has been reported in bugzilla: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=127828 If this is the case, the short term workaround is to change your hostnames. On Mon, Jul 26, 2004 at 02:06:01PM +0200, Michael Krietemeyer wrote: > Hello > > We have setuped our small 4-node cluster with the RedHat > 2.4.21-15.0.3ELsmp Kernel and use the GFS 6.0.0-7 Package. > > One cluster-node exports via gnbd two disks to the three other nodes. > One of these exports is used as CCA-Device ond one for a GFS share. > > Now we setup GFS like the example C.1. in the "Red Hat GFS 6.0 > Administrator's Guide" (three nodes, each as LOCK_GULM Server and GFS > Client, fence method: manual). All steps work fine, except the mount. > On the fist node, the mount works. The mount on the second node blocks, > until the node one unmounts the gfs share. (Summary: Only one node can > mount the share at the same time). > > Can somebody help? > > Michael Krietemeyer > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster -- Adam Manthei From jeff at intersystems.com Wed Jul 28 16:49:48 2004 From: jeff at intersystems.com (Jeff) Date: Wed, 28 Jul 2004 12:49:48 -0400 Subject: [Linux-cluster] Is this intentional: specifying a new completion ast routine on a convert Message-ID: <1524056551.20040728124948@intersystems.com> This is from device.c. The intent seems to be that if an argument is specified, then it overrides an existing value. However, a new completion ast address is only loaded if a new blocking ast address is specified. if (kparams->flags & DLM_LKF_CONVERT) { struct dlm_lkb *lkb = dlm_get_lkb(fi->fi_ls->ls_lockspace, kparams->lkid); if (!lkb) { return -EINVAL; } li = (struct lock_info *)lkb->lkb_astparam; /* Only override these if they are provided */ if (li->li_user_lksb) li->li_user_lksb = kparams->lksb; if (li->li_astparam) li->li_astparam = kparams->astparam; if (li->li_bastaddr) li->li_bastaddr = kparams->bastaddr; ---> if (li->li_bastaddr) ---> li->li_astaddr = kparams->astaddr; li->li_flags = 0; } From jeff at intersystems.com Wed Jul 28 16:54:45 2004 From: jeff at intersystems.com (Jeff) Date: Wed, 28 Jul 2004 12:54:45 -0400 Subject: [Linux-cluster] Is this intentional: specifying a new completion ast routine on a convert In-Reply-To: <1524056551.20040728124948@intersystems.com> References: <1524056551.20040728124948@intersystems.com> Message-ID: <29859916.20040728125445@intersystems.com> Wednesday, July 28, 2004, 12:49:48 PM, Jeff wrote: > This is from device.c. The intent seems to > be that if an argument is specified, then it overrides > an existing value. However, a new completion ast address > is only loaded if a new blocking ast address is specified. > if (kparams->flags & DLM_LKF_CONVERT) { > struct dlm_lkb *lkb = > dlm_get_lkb(fi->fi_ls->ls_lockspace, kparams->lkid); > if (!lkb) { > return -EINVAL; > } > li = (struct lock_info *)lkb->lkb_astparam; > /* Only override these if they are provided */ > if (li->li_user_lksb) > li->li_user_lksb = kparams->lksb; > if (li->li_astparam) > li->li_astparam = kparams->astparam; > if (li->li_bastaddr) > li->li_bastaddr = kparams->bastaddr; --->> if (li->li_bastaddr) --->> li->li_astaddr = kparams->astaddr; > li->li_flags = 0; > } Looking at this again, shouldn't it be testing kparams-> in the if() rather than li->*? The current code seems to write new values if there were old ones as opposed to if new values are specified. From michael.krietemeyer at informatik.uni-rostock.de Wed Jul 28 06:00:05 2004 From: michael.krietemeyer at informatik.uni-rostock.de (Michael Krietemeyer) Date: Wed, 28 Jul 2004 08:00:05 +0200 Subject: [Linux-cluster] GFS mount problem In-Reply-To: <4104F3A9.8080303@informatik.uni-rostock.de> References: <4104F3A9.8080303@informatik.uni-rostock.de> Message-ID: <410740E5.8020809@informatik.uni-rostock.de> Hello Solved! The first 8 bytes of the node names are not equal! M. Krietemeyer Michael Krietemeyer wrote: > Hello > > We have setuped our small 4-node cluster with the RedHat > 2.4.21-15.0.3ELsmp Kernel and use the GFS 6.0.0-7 Package. > > One cluster-node exports via gnbd two disks to the three other nodes. > One of these exports is used as CCA-Device ond one for a GFS share. > > Now we setup GFS like the example C.1. in the "Red Hat GFS 6.0 > Administrator's Guide" (three nodes, each as LOCK_GULM Server and GFS > Client, fence method: manual). All steps work fine, except the mount. > On the fist node, the mount works. The mount on the second node blocks, > until the node one unmounts the gfs share. (Summary: Only one node can > mount the share at the same time). > > Can somebody help? > > Michael Krietemeyer > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From pcaulfie at redhat.com Thu Jul 29 12:38:04 2004 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 29 Jul 2004 13:38:04 +0100 Subject: [Linux-cluster] Is this intentional: specifying a new completion ast routine on a convert In-Reply-To: <29859916.20040728125445@intersystems.com> References: <1524056551.20040728124948@intersystems.com> <29859916.20040728125445@intersystems.com> Message-ID: <20040729123803.GA26311@tykepenguin.com> On Wed, Jul 28, 2004 at 12:54:45PM -0400, Jeff wrote: > Wednesday, July 28, 2004, 12:49:48 PM, Jeff wrote: > > > This is from device.c. The intent seems to > > be that if an argument is specified, then it overrides > > an existing value. However, a new completion ast address > > is only loaded if a new blocking ast address is specified. > > > if (kparams->flags & DLM_LKF_CONVERT) { > > struct dlm_lkb *lkb = > > dlm_get_lkb(fi->fi_ls->ls_lockspace, kparams->lkid); > > if (!lkb) { > > return -EINVAL; > > } > > li = (struct lock_info *)lkb->lkb_astparam; > > > /* Only override these if they are provided */ > > if (li->li_user_lksb) > > li->li_user_lksb = kparams->lksb; > > if (li->li_astparam) > > li->li_astparam = kparams->astparam; > > if (li->li_bastaddr) > > li->li_bastaddr = kparams->bastaddr; > --->> if (li->li_bastaddr) > --->> li->li_astaddr = kparams->astaddr; > > li->li_flags = 0; > > } > > Looking at this again, shouldn't it be testing kparams-> in > the if() rather than li->*? The current code seems to write new > values if there were old ones as opposed to if new values > are specified. er, yes it looks like it. I'll check in a fix when I get back home. -- patrick From dascalu_dragos at bah.com Thu Jul 29 14:59:20 2004 From: dascalu_dragos at bah.com (Dascalu Dragos) Date: Thu, 29 Jul 2004 10:59:20 -0400 Subject: [Linux-cluster] Only root can write on GFS volume... Message-ID: We are currently experimenting w/ GFS and have run into a problems we can not seem to find an answer to. To set the stage: We have 3 machines connected through an optical switch to a SAN. For simplicity purposes we have created a LUN which can be seen by the 3 machines. GFS+modules+patches are successfully running. We are using LMV2 and created a volume group called "test" on the /dev/sdb5 which is what the machines see the LUN as. We then created a logical volume called "one" on this volume group. web3:~# ls -la /dev/test/ total 28 dr-x------ 2 root root 4096 Jul 29 10:01 . drwxr-xr-x 13 root root 24576 Jul 29 10:01 .. lrwx------ 1 root root 20 Jul 29 10:01 one -> /dev/mapper/test-one web3:~# ls -la /dev/mapper/ total 28 drwxr-xr-x 2 tomcat tomcat 4096 Jul 29 10:13 . drwxr-xr-x 13 root root 24576 Jul 29 10:01 .. crw------- 1 root root 10, 63 Jul 29 10:13 control brw------- 1 root root 254, 0 Jul 29 10:01 test-one /dev/test/one was formatted using "gfs_mkfs -p lock_dlm -t webserver:one -j 4 /dev/test/one". (there will be 4 machines in the future) System starts fine, all 3 machines are member nodes and can successfully mount /dev/test/one on "/test" (mount point on / we created for testing). _______ Problem ------------ When /test is accessed and written to as root everything is fine, new data gets updated on the other nodes in real time. However if another user besides root attempts to write to /test the partition locks up (basically the shell we are in locks appearing to wait for the return of the "touch new_file" command). In the process tree we can see the touch command however it can not be killed nor can /test be unmounted. At this point on any of the 3 machines an "ls -la /test" has the same frozen behavior. Only a reboot gets things back to normal :( We first thought that this may be an LMV2 problem but if /dev/test/one is formatted as ext3 not gfs then mounted on /test is can be written to fine by all users including root (we also gave 777 permissions to all objects in /dev/test and /dev/mapper). This also does not seem to be an obvious OS perm issue... Scenario 1: drwxr-xr-x 2 root root 4096 Jul 29 10:02 test Here only root can write and everyone read. If another user but root tries to write the get an "touch: creating `/test/tom': Permission denied". This does not cause a system freeze. Scenario 2: drwxrwxrwx 2 root root 4096 Jul 29 10:02 test Here everyone can do anything to this directory and if another user but root tries to write system freezes. It appears that when it is formatted as gfs no one but root can write to it. Any thoughts? Dede. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3399 bytes Desc: not available URL: From teigland at redhat.com Fri Jul 30 04:59:05 2004 From: teigland at redhat.com (David Teigland) Date: Fri, 30 Jul 2004 12:59:05 +0800 Subject: [Linux-cluster] Only root can write on GFS volume... In-Reply-To: References: Message-ID: <20040730045905.GB13525@redhat.com> On Thu, Jul 29, 2004 at 10:59:20AM -0400, Dascalu Dragos wrote: > It appears that when it is formatted as gfs no one but root can write > to it. Any thoughts? This was solved by updating to the latest dlm source code; the problem was fixed a couple weeks ago. -- Dave Teigland From kpfleming at backtobasicsmgmt.com Sat Jul 31 14:40:48 2004 From: kpfleming at backtobasicsmgmt.com (Kevin P. Fleming) Date: Sat, 31 Jul 2004 07:40:48 -0700 Subject: [Linux-cluster] Re: [ANNOUNCE] OpenSSI 1.0.0 released!! In-Reply-To: <410B80BC.4060100@hp.com> References: <410B80BC.4060100@hp.com> Message-ID: <410BAF70.7010205@backtobasicsmgmt.com> Aneesh Kumar K.V wrote: > 5. Devices > * there is a clusterwide device model via the devfs code Yeah, that's we want, take buggy, unreliable, soon-to-be-removed-from-mainline code and put an entire clustering layer on top of it. Too bad someone is going to need to completely reimplement this "clusterwide device model". From bruce.walker at hp.com Sat Jul 31 16:00:34 2004 From: bruce.walker at hp.com (Walker, Bruce J) Date: Sat, 31 Jul 2004 09:00:34 -0700 Subject: [Linux-cluster] Re: [ANNOUNCE] OpenSSI 1.0.0 released!! Message-ID: <3689AF909D816446BA505D21F1461AE4C750E6@cacexc04.americas.cpqcorp.net> Kevin, Got out of bed on the wrong side? Such anger. First, the clusterwide device capability is a very small part of OpenSSI so your comment "put the entire clustering layer on top of it" is COMPLETELY wrong - you clearly are commenting about something you know nothing about. In the 2.4 implementation, providing this one capability by leveraging devfs was quite economic, efficient and has been very stable. I'm not sure who you mean by "that's what WE want". If you mean the current worldwide users of OpenSSI on 2.4, they are a very happy group with a kick-ass clustering capability. About one thing you are correct. We are going to have to have a way to lookup and name remote devices in 2.6. I believe the remote file-op mechanism we are using in 2.4 will adapt easily. Bruce Walker Architect and project manager - OpenSSI project > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kevin > P. Fleming > Sent: Saturday, July 31, 2004 7:41 AM > To: Linux Kernel Mailing List > Cc: linux-cluster at redhat.com; > opengfs-devel at lists.sourceforge.net; > opengfs-users at lists.sourceforge.net; > opendlm-devel at lists.sourceforge.net > Subject: [Linux-cluster] Re: [ANNOUNCE] OpenSSI 1.0.0 released!! > > > Aneesh Kumar K.V wrote: > > > 5. Devices > > * there is a clusterwide device model via the devfs code > > Yeah, that's we want, take buggy, unreliable, > soon-to-be-removed-from-mainline code and put an entire > clustering layer > on top of it. Too bad someone is going to need to completely > reimplement > this "clusterwide device model". > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster > From aneesh.kumar at hp.com Sat Jul 31 11:21:32 2004 From: aneesh.kumar at hp.com (Aneesh Kumar K.V) Date: Sat, 31 Jul 2004 16:51:32 +0530 Subject: [Linux-cluster] [ANNOUNCE] OpenSSI 1.0.0 released!! Message-ID: <410B80BC.4060100@hp.com> Hi, Sorry for the cross post. I came across this on OpenSSI website. I guess others may also be interested. -aneesh The OpenSSI project leverages both HP's NonStop Clusters for Unixware technology and other open source technology to provide a full, highly available Single System Image environment for Linux. Feature list: 1. Cluster Membership * includes libcluster that application can use 2. Internode Communication 3. Filesystem * support for CFS over ext3, Lustre Lite * CFS can be used for the root * reopen of files, devices, ipc objects when processes move is supported * CFS supports file record locking and shared writable mapped files (along with all other standard POSIX capabilities * HA-CFS is configurable for the root or other filesystems 4. Process Management * almost all pieces there, including: o clusterwide PIDs o process migration and distributed rexec(), rfork() and migrate() with reopen of files, sockets, pipes, devices, etc. o vprocs o clusterwide signalling, get/setpriority o capabilities o distributed process groups, session, controlling terminal o surrogate origin functionality o no single points of failure (cleanup code to deal with nodedowns) o Mosix load leveler (with the process migration model from NSC) o clusterwide ptrace() and strace o clusterwide /proc/, ps, top, etc. 5. Devices * there is a clusterwide device model via the devfs code * each node mounts its devfs on /cluster/node#/dev and bind mounts it to /dev so all devices are visible and accessible from all nodes, but by default you see only local devices * a process on any node can open a device on any node * devices are reopened when processes move * processes retain a context, even if they move; the context determines which node's devices to access by defaul 6. IPC * all IPC objects/mechanisms are clusterwide: o pipes o fifos o signalling o message queues o semaphore o shared memory o Unix-domain sockets o Internet-domain sockets * reopen of IPC objects is there for process movement * nodedown handling is there for all IPC objects 7. Clusterwide TCP/IP * HA-LVS is integrated, with extensions * extension is that port redirection to servers in the cluster is automatic and doesn't have to be managed. 8. Kernel Data Replication Service * it is in there (cluster/ssi/clreg) 9. Shared Storage * we have tested shared FCAL and use it for HA-CFS 10. DLM * is integrated with CLMS and is HA 11. Sysadmin * services architecture has been made clusterwide 12. Init, Booting and Run Levels * system runs with a single init which will failover/restart on another node if the node it is on dies 13. Application Availability * application monitoring/restart provided by spawndaemon/keepalive * services started by RC on the initnode will automatically restart on a failure of the initnode 14. Timesync * NTP for now 15. Load Leveling * adapted the openMosix algorithm * for connection load balancing, using HA-LVS * load leveling is on by default * applications must be registered to load level 16. Packaging/Install * Have source patch, binary RPMs and CVS source options; * Debian packages also available via ap-get repository. * First node is incremental to a standard Linux install * Other nodes install via netboot, PXEboot, DHCP and simple addnode command; 17. Object Interfaces * standard interfaces for objects work as expected * no new interfaces for object location or movement except for processes (rexec(), migrate(), and /proc/pid/goto to move a process) From tao at acc.umu.se Sat Jul 31 16:35:58 2004 From: tao at acc.umu.se (David Weinehall) Date: Sat, 31 Jul 2004 18:35:58 +0200 Subject: [Linux-cluster] Re: [ANNOUNCE] OpenSSI 1.0.0 released!! In-Reply-To: <410B80BC.4060100@hp.com> References: <410B80BC.4060100@hp.com> Message-ID: <20040731163558.GA10689@khan.acc.umu.se> On Sat, Jul 31, 2004 at 04:51:32PM +0530, Aneesh Kumar K.V wrote: > Hi, > > Sorry for the cross post. I came across this on OpenSSI website. I guess > others may also be interested. > > -aneesh > > The OpenSSI project leverages both HP's NonStop Clusters for Unixware > technology and other open source technology to provide a full, highly > available Single System Image environment for Linux. I can already hear SCO's lawyers screaming "They are taking technology from UnixWare and incorporating in Linux! Let's sue them!!!"... That said, this looks really interesting. Regards: David Weinehall -- /) David Weinehall /) Northern lights wander (\ // Maintainer of the v2.0 kernel // Dance across the winter sky // \) http://www.acc.umu.se/~tao/ (/ Full colour fire (/ From crh at ubiqx.mn.org Fri Jul 30 23:15:51 2004 From: crh at ubiqx.mn.org (Christopher R. Hertel) Date: Fri, 30 Jul 2004 18:15:51 -0500 Subject: [Linux-cluster] Re: Welcome to the "Linux-cluster" mailing list In-Reply-To: References: Message-ID: <20040730231551.GB20038@Favog.ubiqx.mn.org> Man, that was fast... -- "Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X Samba Team -- http://www.samba.org/ -)----- Christopher R. Hertel jCIFS Team -- http://jcifs.samba.org/ -)----- ubiqx development, uninq. ubiqx Team -- http://www.ubiqx.org/ -)----- crh at ubiqx.mn.org OnLineBook -- http://ubiqx.org/cifs/ -)----- crh at ubiqx.org