From klakshman03 at hotmail.com Sun Feb 1 14:50:43 2009 From: klakshman03 at hotmail.com (lakshmana swamy) Date: Sun, 1 Feb 2009 20:20:43 +0530 Subject: [Linux-cluster] GFS Issue In-Reply-To: <20090130170011.5B1548E09FB@hormel.redhat.com> References: <20090130170011.5B1548E09FB@hormel.redhat.com> Message-ID: Hi Friends, I have configured gfs on RHEL 5.2, While mounting It gives the following error messages. ==================== [root at lvs1 ~]# mount -t gfs -v /dev/sdb1 /share/ /sbin/mount.gfs: mount /dev/sdb1 /share /sbin/mount.gfs: parse_opts: opts = "rw" /sbin/mount.gfs: clear flag 1 for "rw", flags = 0 /sbin/mount.gfs: parse_opts: flags = 0 /sbin/mount.gfs: parse_opts: extra = "" /sbin/mount.gfs: parse_opts: hostdata = "" /sbin/mount.gfs: parse_opts: lockproto = "" /sbin/mount.gfs: parse_opts: locktable = "" /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: /sbin/mount.gfs: write "join /share gfs lock_dlm laxman:gfs rw /dev/sdb1" /sbin/mount.gfs: fs is for a different cluster /sbin/mount.gfs: error mounting lockproto lock_dlm [root at lvs1 ~]# =================== log messages here Feb 1 19:52:23 lvs1 gfs_controld[7267]: mount: fs requires cluster="laxman" current="locuz" Feb 1 19:59:46 lvs1 gfs_controld[7267]: mount: fs requires cluster="laxman" current="locuz" ======= I have installed all the necessary packages. But Iam unable to figure out whats the problem, pls help me !!! Thanking you laxman _________________________________________________________________ Find a better job. We have plenty. Visit MSN Jobs http://www.in.msn.com/jobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeder at invision.net Sun Feb 1 15:51:27 2009 From: jeder at invision.net (Jeremy Eder) Date: Sun, 1 Feb 2009 10:51:27 -0500 Subject: [Linux-cluster] (no subject) Message-ID: <1734CA24F5FC1848880E6B1AB788DD7701E3DF19@INV-EX1.ad.invision.net> when you created the gfs filesystem you specified a cluster name different from the cluster that that the machine belongs to. reconcile the difference one way or the other and it will mount. From stewart at epits.com.au Sun Feb 1 16:15:26 2009 From: stewart at epits.com.au (Stewart Walters) Date: Mon, 02 Feb 2009 01:15:26 +0900 Subject: [Linux-cluster] GFS Issue In-Reply-To: References: <20090130170011.5B1548E09FB@hormel.redhat.com> Message-ID: <4985CA9E.3090702@epits.com.au> lakshmana swamy wrote: > > Hi Friends, > > I have configured gfs on RHEL 5.2, While mounting It gives the > following error messages. > > ==================== > > [root at lvs1 ~]# mount -t gfs -v /dev/sdb1 /share/ > /sbin/mount.gfs: mount /dev/sdb1 /share > /sbin/mount.gfs: parse_opts: opts = "rw" > /sbin/mount.gfs: clear flag 1 for "rw", flags = 0 > /sbin/mount.gfs: parse_opts: flags = 0 > /sbin/mount.gfs: parse_opts: extra = "" > /sbin/mount.gfs: parse_opts: hostdata = "" > /sbin/mount.gfs: parse_opts: lockproto = "" > /sbin/mount.gfs: parse_opts: locktable = "" > /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: > /sbin/mount.gfs: write "join /share gfs lock_dlm laxman:gfs rw /dev/sdb1" > */sbin/mount.gfs: fs is for a different cluster > /sbin/mount.gfs: error mounting lockproto lock_dlm* > [root at lvs1 ~]# > > =================== > log messages here > * > Feb 1 19:52:23 lvs1 gfs_controld[7267]: mount: fs requires > cluster="laxman" current="locuz" > Feb 1 19:59:46 lvs1 gfs_controld[7267]: mount: fs requires > cluster="laxman" current="locuz" > * > ======= > > I have installed all the necessary packages. But Iam unable to figure > out whats the problem, pls help me !!! > > Thanking you > > laxman > > > > > > ------------------------------------------------------------------------ > Get a view of the world through MSN Video. Some things just cannot be > left unseen. Try it! > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster GFS volumes are tagged to the cluster they are locked by. This is done when they are created. I've never seen this message before, but it seems straight forward - the cluster name defined in /etc/cluster/cluster.conf is different from the one tagged on the GFS volume. In this case the cluster name in /etc/cluster/cluster.conf is defined as "locuz", but the GFS volumes were tagged as belonging to "laxman" Check cluster.conf: [root at clusternode01 ~]# cat /etc/cluster/cluster.conf|grep config_version Now check the GFS volume: [root at clusternode01 ~]# gfs_tool list 18146975903901458037 dm-2 cluster01:gfs.0 You'll probably find in your case that name="cluster01" and "cluster01:gfs" don't match. To change it, unmount the volume from all nodes, then check out the "man gfs_tool" for the "gfs_tool sb device table" command for how to modify the GFS volume. Also check https://wiki.ncl.cs.columbia.edu/wiki/index.php/GFS_Initialization for an example on how to use the command. Regards, Stewart From stewart at epits.com.au Sun Feb 1 16:19:46 2009 From: stewart at epits.com.au (Stewart Walters) Date: Mon, 02 Feb 2009 01:19:46 +0900 Subject: [Linux-cluster] Samba with CTDB support timeline? Message-ID: <4985CBA2.1040002@epits.com.au> Hi, Not strictly RHCS related, but does anyone know when RHEL is likely to ship with a version of Samba that has CTDB support by default? Regards, Stewart From crh at ubiqx.mn.org Sun Feb 1 18:44:44 2009 From: crh at ubiqx.mn.org (Christopher R. Hertel) Date: Sun, 01 Feb 2009 12:44:44 -0600 Subject: [Linux-cluster] Samba with CTDB support timeline? In-Reply-To: <4985CBA2.1040002@epits.com.au> References: <4985CBA2.1040002@epits.com.au> Message-ID: <4985ED9C.9030700@ubiqx.mn.org> Most of the work on CTDB clustering (of which I am aware) is being done with Samba on GPFS. It would be great if someone would take on the task of making CTDB work on GFS. Chris -)----- Stewart Walters wrote: > Hi, > > Not strictly RHCS related, but does anyone know when RHEL is likely to > ship with a version of Samba that has CTDB support by default? > > Regards, > > Stewart > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- "Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X Samba Team -- http://www.samba.org/ -)----- Christopher R. Hertel jCIFS Team -- http://jcifs.samba.org/ -)----- ubiqx development, uninq. ubiqx Team -- http://www.ubiqx.org/ -)----- crh at ubiqx.mn.org OnLineBook -- http://ubiqx.org/cifs/ -)----- crh at ubiqx.org From adas at redhat.com Mon Feb 2 03:52:59 2009 From: adas at redhat.com (Abhijith Das) Date: Sun, 01 Feb 2009 21:52:59 -0600 Subject: [Linux-cluster] Samba with CTDB support timeline? In-Reply-To: <4985CBA2.1040002@epits.com.au> References: <4985CBA2.1040002@epits.com.au> Message-ID: <49866E1B.4050701@redhat.com> Stewart Walters wrote: > Hi, > > Not strictly RHCS related, but does anyone know when RHEL is likely to > ship with a version of Samba that has CTDB support by default? > > Regards, > > Stewart > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluste We are in the process of getting the ctdb package into Fedora. Once in Fedora, it will make it's way into RHEL. Unfortunately, I'm not aware of the timeline. Cheers! --Abhi From klakshman03 at hotmail.com Mon Feb 2 04:48:16 2009 From: klakshman03 at hotmail.com (lakshmana swamy) Date: Mon, 2 Feb 2009 10:18:16 +0530 Subject: [Linux-cluster] RE: Linux-cluster Digest, Vol 58, Issue 1 In-Reply-To: <20090201170007.4F4FF61D0FB@hormel.redhat.com> References: <20090201170007.4F4FF61D0FB@hormel.redhat.com> Message-ID: Thank you very much Jeremy Eder & Stewart It works fine Thanking you, laxman > From: linux-cluster-request at redhat.com > Subject: Linux-cluster Digest, Vol 58, Issue 1 > To: linux-cluster at redhat.com > Date: Sun, 1 Feb 2009 12:00:07 -0500 > > Send Linux-cluster mailing list submissions to > linux-cluster at redhat.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/linux-cluster > or, via email, send a message with subject or body 'help' to > linux-cluster-request at redhat.com > > You can reach the person managing the list at > linux-cluster-owner at redhat.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-cluster digest..." > > > Today's Topics: > > 1. GFS Issue (lakshmana swamy) > 2. (no subject) (Jeremy Eder) > 3. Re: GFS Issue (Stewart Walters) > 4. Samba with CTDB support timeline? (Stewart Walters) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 1 Feb 2009 20:20:43 +0530 > From: lakshmana swamy > Subject: [Linux-cluster] GFS Issue > To: > Message-ID: > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi Friends, > > I have configured gfs on RHEL 5.2, While mounting It gives the following error messages. > > ==================== > > [root at lvs1 ~]# mount -t gfs -v /dev/sdb1 /share/ > /sbin/mount.gfs: mount /dev/sdb1 /share > /sbin/mount.gfs: parse_opts: opts = "rw" > /sbin/mount.gfs: clear flag 1 for "rw", flags = 0 > /sbin/mount.gfs: parse_opts: flags = 0 > /sbin/mount.gfs: parse_opts: extra = "" > /sbin/mount.gfs: parse_opts: hostdata = "" > /sbin/mount.gfs: parse_opts: lockproto = "" > /sbin/mount.gfs: parse_opts: locktable = "" > /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: > /sbin/mount.gfs: write "join /share gfs lock_dlm laxman:gfs rw /dev/sdb1" > /sbin/mount.gfs: fs is for a different cluster > /sbin/mount.gfs: error mounting lockproto lock_dlm > [root at lvs1 ~]# > > =================== > log messages here > > Feb 1 19:52:23 lvs1 gfs_controld[7267]: mount: fs requires cluster="laxman" current="locuz" > Feb 1 19:59:46 lvs1 gfs_controld[7267]: mount: fs requires cluster="laxman" current="locuz" > > ======= > > I have installed all the necessary packages. But Iam unable to figure out whats the problem, pls help me !!! > > Thanking you > > laxman > > > > > > _________________________________________________________________ > Find a better job. We have plenty. Visit MSN Jobs > http://www.in.msn.com/jobs > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: https://www.redhat.com/archives/linux-cluster/attachments/20090201/c6ab6b85/attachment.html > > ------------------------------ > > Message: 2 > Date: Sun, 1 Feb 2009 10:51:27 -0500 > From: Jeremy Eder > Subject: [Linux-cluster] (no subject) > To: linux clustering > Message-ID: > <1734CA24F5FC1848880E6B1AB788DD7701E3DF19 at INV-EX1.ad.invision.net> > Content-Type: text/plain; charset="us-ascii" > > when you created the gfs filesystem you specified a cluster name different from the cluster that that the machine belongs to. reconcile the difference one way or the other and it will mount. > > > > ------------------------------ > > Message: 3 > Date: Mon, 02 Feb 2009 01:15:26 +0900 > From: Stewart Walters > Subject: Re: [Linux-cluster] GFS Issue > To: linux clustering > Message-ID: <4985CA9E.3090702 at epits.com.au> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > lakshmana swamy wrote: > > > > Hi Friends, > > > > I have configured gfs on RHEL 5.2, While mounting It gives the > > following error messages. > > > > ==================== > > > > [root at lvs1 ~]# mount -t gfs -v /dev/sdb1 /share/ > > /sbin/mount.gfs: mount /dev/sdb1 /share > > /sbin/mount.gfs: parse_opts: opts = "rw" > > /sbin/mount.gfs: clear flag 1 for "rw", flags = 0 > > /sbin/mount.gfs: parse_opts: flags = 0 > > /sbin/mount.gfs: parse_opts: extra = "" > > /sbin/mount.gfs: parse_opts: hostdata = "" > > /sbin/mount.gfs: parse_opts: lockproto = "" > > /sbin/mount.gfs: parse_opts: locktable = "" > > /sbin/mount.gfs: message to gfs_controld: asking to join mountgroup: > > /sbin/mount.gfs: write "join /share gfs lock_dlm laxman:gfs rw /dev/sdb1" > > */sbin/mount.gfs: fs is for a different cluster > > /sbin/mount.gfs: error mounting lockproto lock_dlm* > > [root at lvs1 ~]# > > > > =================== > > log messages here > > * > > Feb 1 19:52:23 lvs1 gfs_controld[7267]: mount: fs requires > > cluster="laxman" current="locuz" > > Feb 1 19:59:46 lvs1 gfs_controld[7267]: mount: fs requires > > cluster="laxman" current="locuz" > > * > > ======= > > > > I have installed all the necessary packages. But Iam unable to figure > > out whats the problem, pls help me !!! > > > > Thanking you > > > > laxman > > > > > > > > > > > > ------------------------------------------------------------------------ > > Get a view of the world through MSN Video. Some things just cannot be > > left unseen. Try it! > > ------------------------------------------------------------------------ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > GFS volumes are tagged to the cluster they are locked by. This is done > when they are created. > > I've never seen this message before, but it seems straight forward - the > cluster name defined in /etc/cluster/cluster.conf is different from the > one tagged on the GFS volume. In this case the cluster name in > /etc/cluster/cluster.conf is defined as "locuz", but the GFS volumes > were tagged as belonging to "laxman" > > Check cluster.conf: > > [root at clusternode01 ~]# cat /etc/cluster/cluster.conf|grep config_version > > > Now check the GFS volume: > > [root at clusternode01 ~]# gfs_tool list > 18146975903901458037 dm-2 cluster01:gfs.0 > > You'll probably find in your case that name="cluster01" and > "cluster01:gfs" don't match. > > To change it, unmount the volume from all nodes, then check out the "man > gfs_tool" for the "gfs_tool sb device table" command for how to modify > the GFS volume. Also check > https://wiki.ncl.cs.columbia.edu/wiki/index.php/GFS_Initialization for > an example on how to use the command. > > Regards, > > Stewart > > > > ------------------------------ > > Message: 4 > Date: Mon, 02 Feb 2009 01:19:46 +0900 > From: Stewart Walters > Subject: [Linux-cluster] Samba with CTDB support timeline? > To: Linux Cluster Mailing List > Message-ID: <4985CBA2.1040002 at epits.com.au> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi, > > Not strictly RHCS related, but does anyone know when RHEL is likely to > ship with a version of Samba that has CTDB support by default? > > Regards, > > Stewart > > > > ------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 58, Issue 1 > ******************************************** _________________________________________________________________ Wish to Marry Now? Join MSN Matrimony FREE! http://www.in.msn.com/matrimony -------------- next part -------------- An HTML attachment was scrubbed... URL: From yamato at redhat.com Mon Feb 2 07:34:14 2009 From: yamato at redhat.com (Masatake YAMATO) Date: Mon, 02 Feb 2009 16:34:14 +0900 (JST) Subject: [Linux-cluster] [PATCH] use defined constant instead of raw literal in totemsrp.c In-Reply-To: <20090130.170240.56227421577108127.yamato@redhat.com> References: <20090130.170240.56227421577108127.yamato@redhat.com> Message-ID: <20090202.163414.938233730596300941.yamato@redhat.com> Could you apply this patch if appreciated? Masatake YAMATO Index: exec/totemsrp.c =================================================================== --- exec/totemsrp.c (revision 1752) +++ exec/totemsrp.c (working copy) @@ -1534,7 +1534,7 @@ sizeof (struct iovec) * recovery_message_item->iov_len); } else { mcast = recovery_message_item->iovec[0].iov_base; - if (mcast->header.encapsulated == 1) { + if (mcast->header.encapsulated == MESSAGE_ENCAPSULATED) { /* * Message is a recovery message encapsulated * in a new ring message From mockey.chen at nsn.com Mon Feb 2 07:41:25 2009 From: mockey.chen at nsn.com (Mockey Chen) Date: Mon, 02 Feb 2009 15:41:25 +0800 Subject: [Linux-cluster] Custom recovery policy: restart then relocate ? Message-ID: <4986A3A5.9090207@nsn.com> Hi, I want to one of my cluster service's recovery policy like this: When detected the service failed, first try to restart it, if restart failed, then try to relocate it. I can one choose restart or relocate in configure, anybody can give some hints. Thanks. From sdake at redhat.com Mon Feb 2 07:59:33 2009 From: sdake at redhat.com (Steven Dake) Date: Mon, 02 Feb 2009 00:59:33 -0700 Subject: [Linux-cluster] [PATCH] use defined constant instead of raw literal in totemsrp.c In-Reply-To: <20090202.163414.938233730596300941.yamato@redhat.com> References: <20090130.170240.56227421577108127.yamato@redhat.com> <20090202.163414.938233730596300941.yamato@redhat.com> Message-ID: <1233561573.6115.1.camel@sdake-laptop> Masatake, My ssh keys are currently in limbo but once they are sorted out I will merge this patch. Thanks for the work! Regards -steve On Mon, 2009-02-02 at 16:34 +0900, Masatake YAMATO wrote: > Could you apply this patch if appreciated? > > > Masatake YAMATO > > > Index: exec/totemsrp.c > =================================================================== > --- exec/totemsrp.c (revision 1752) > +++ exec/totemsrp.c (working copy) > @@ -1534,7 +1534,7 @@ > sizeof (struct iovec) * recovery_message_item->iov_len); > } else { > mcast = recovery_message_item->iovec[0].iov_base; > - if (mcast->header.encapsulated == 1) { > + if (mcast->header.encapsulated == MESSAGE_ENCAPSULATED) { > /* > * Message is a recovery message encapsulated > * in a new ring message > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From theophanis_kontogiannis at yahoo.gr Mon Feb 2 09:04:29 2009 From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis) Date: Mon, 2 Feb 2009 11:04:29 +0200 Subject: [Linux-cluster] Custom recovery policy: restart then relocate ? In-Reply-To: <4986A3A5.9090207@nsn.com> References: <4986A3A5.9090207@nsn.com> Message-ID: <001f01c98515$42eb8af0$c8c2a0d0$@gr> Hi Mockey, If you choose "restart" then what you want to do is the default behavior for clustered services. If you choose "restart" as policy, then the service will be first restarted, and if it fails it will be relocated (as long as you have the service, run on any cluster member and not only on one cluster member). Sincerely, Theophanis Kontogiannis > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster- > bounces at redhat.com] On Behalf Of Mockey Chen > Sent: Monday, February 02, 2009 9:41 AM > To: linux clustering > Subject: [Linux-cluster] Custom recovery policy: restart then relocate ? > > Hi, > > I want to one of my cluster service's recovery policy like this: > > When detected the service failed, first try to restart it, if restart > failed, then try to relocate it. > > I can one choose restart or relocate in configure, anybody can give some > hints. > > Thanks. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From yamato at redhat.com Mon Feb 2 10:33:54 2009 From: yamato at redhat.com (Masatake YAMATO) Date: Mon, 02 Feb 2009 19:33:54 +0900 (JST) Subject: [Linux-cluster] [patch] website about crypt.c In-Reply-To: <001f01c98515$42eb8af0$c8c2a0d0$@gr> References: <4986A3A5.9090207@nsn.com> <001f01c98515$42eb8af0$c8c2a0d0$@gr> Message-ID: <20090202.193354.42662687400854549.yamato@redhat.com> It seems that libtomcrypt.org is moved. Masatake YAMATO Index: exec/crypto.c =================================================================== --- exec/crypto.c (revision 1752) +++ exec/crypto.c (working copy) @@ -6,7 +6,7 @@ * The library is free for all purposes without any express * guarantee it works. * - * Tom St Denis, tomstdenis at iahu.ca, http://libtomcrypt.org + * Tom St Denis, tomstdenis at iahu.ca, http://libtomcrypt.com/ */ #include #include From fdinitto at redhat.com Tue Feb 3 05:34:38 2009 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 03 Feb 2009 06:34:38 +0100 Subject: [Linux-cluster] Cluster 3.0.0.alpha4 released Message-ID: <1233639278.3835.1.camel@cerberus.int.fabbione.net> The cluster team and its community are proud to announce the 3.0.0.alpha4 release from the STABLE3 branch. The development cycle for 3.0.0 is about to end. With the new STABLE3 branch that will collect only bug fixes and minimal update required to build on top of the latest upstream kernel, we are getting closer and closer to a shiny new stable release. Everybody with test equipment and time to spare, is highly encouraged to download, install and test the 3.0.0.alpha releases and more important report problems. This is the time for people to make a difference and help us testing as much as possible. In order to build the 3.0.0.alpha4 release you will need: - corosync svn r1756. - openais svn r1688. - linux kernel 2.6.28. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-3.0.0.alpha4.tar.gz https://fedorahosted.org/releases/c/l/cluster/cluster-3.0.0.alpha4.tar.gz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 3.0.0.alpha3): Abhijith Das (1): gfs2-utils: Bug 481762 - No longer able to mount GFS volume with noatime,noquota options Christine Caulfield (1): cman: make cman_tool show node names if possible Fabio M. Di Nitto (17): ccs: allow random config_versions build: propagate prefix information into defines.mk build: clean .pc files build: first cut at generating .pc files build: export INCDIR and LIBDIR to pc files build: add pkgconfig dir notion to defines.mk.input build: add knowledge for install/uninstall of .pc files build: update .gitignore logthread: add pkgconfig support libcman: add pkgconfig support libccs: add pkgconfig support libdlmcontrol: add pkgconfig support libfence: add pkgconfig support libdlm: add pkgconfig support contrib: fix libaislock build and linking libaislock: add pkgconfig support build: stop publishing to sources.redhat.com automatically Marek 'marx' Grac (1): [RGMANAGER] Resolves #483093 - samba.sh tries to kill the wrong pid file .gitignore | 1 + cman/cman_tool/main.c | 15 ++++++++++++--- cman/lib/libcman.pc.in | 11 +++++++++++ common/liblogthread/liblogthread.pc.in | 11 +++++++++++ config/libs/libccsconfdb/libccs.c | 19 +++++++++---------- config/libs/libccsconfdb/libccs.pc.in | 11 +++++++++++ configure | 1 + contrib/libaislock/Makefile | 2 ++ contrib/libaislock/libaislock.pc.in | 11 +++++++++++ dlm/libdlm/Makefile | 22 +++++++++++++++++++++- dlm/libdlm/libdlm.pc.in | 11 +++++++++++ dlm/libdlm/libdlm_lt.pc.in | 11 +++++++++++ dlm/libdlmcontrol/libdlmcontrol.pc.in | 11 +++++++++++ fence/libfence/libfence.pc.in | 11 +++++++++++ gfs2/mount/util.c | 4 ++++ make/clean.mk | 2 +- make/defines.mk.input | 2 ++ make/install.mk | 4 ++++ make/libs.mk | 14 +++++++++++++- make/release.mk | 5 ----- make/uninstall.mk | 3 +++ rgmanager/src/resources/samba.sh | 4 ++-- 22 files changed, 163 insertions(+), 23 deletions(-) From klakshman03 at hotmail.com Tue Feb 3 11:04:38 2009 From: klakshman03 at hotmail.com (lakshmana swamy) Date: Tue, 3 Feb 2009 16:34:38 +0530 Subject: [Linux-cluster] Apache HA cluster issues Message-ID: Hi friends I have some doubts on apache HA Clustering in RHEL 5.2 I have implemented Shared GFS in two nodes. Now I want Implement Apache Webserver clustering on the same nodes. I want to use file system as shared GFS, for Apache cluster nodes. 1. While configuring the HA, do I need to configure a " Resource" for GFS ? 2. What is the " Shutdown wait (Seconds)" option in "Apache Server" resource. 3. What are the required Resources for Apache Clustering? Kindly Help me, to fix the above Issues. Thanking you Laxman _________________________________________________________________ For the freshest Indian Jobs Visit MSN Jobs http://www.in.msn.com/jobs -------------- next part -------------- An HTML attachment was scrubbed... URL: From yamato at redhat.com Tue Feb 3 11:52:16 2009 From: yamato at redhat.com (Masatake YAMATO) Date: Tue, 03 Feb 2009 20:52:16 +0900 (JST) Subject: [Linux-cluster] packet dissectors for totemnet and totemsrp of corosync Message-ID: <20090203.205216.547030186854654083.yamato@redhat.com> I've written wireshark packet dissectors for totemnet and totemsrp of corosync. See the attached png file. I have already submitted the patch to wireshark developers. https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232 Currently you have to get the patch from the bugziila and apply it to the original source code by yourself. I hope the patch is merged in soon. The greatest point is this patch supports decryption: you don't have to turn off "secauth" before capturing. However, "key" is needed to decrypt the packet. If you don't specified explicitly in cluster.conf, the name of cluster is the key. See the bugziila entry how to give the key to wireshark. I'm working on upper layer protocols like totempg, cman, "a", clvmd, and so on. Enjoy! Masatake YAMATO -------------- next part -------------- A non-text attachment was scrubbed... Name: corosync0.png Type: image/png Size: 148206 bytes Desc: not available URL: From Alain.Moulle at bull.net Tue Feb 3 13:02:27 2009 From: Alain.Moulle at bull.net (Alain.Moulle) Date: Tue, 03 Feb 2009 14:02:27 +0100 Subject: [Linux-cluster] cman 2.0.98 // return code on mkqdisk -L has been changed ? Message-ID: <49884063.6000508@bull.net> Hi It seems that between cman 2.0.73 and cman 2.0.98, the return code on mkqdisk -L has been changed when there is no qdisk configured : -2.0.73 : mkqdisk -L returned 0 even if there was no quorum disk configured -2.0.98 : mkqdisk -L returns 255 if no quorum has been configured. And I can't see any trace about this change in the cman.spec %changelog record. Is the last one the definitive behavior ? or is there an error somewhere? Thanks Regards Alain -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdake at redhat.com Tue Feb 3 15:29:29 2009 From: sdake at redhat.com (Steven Dake) Date: Tue, 03 Feb 2009 08:29:29 -0700 Subject: [Linux-cluster] Re: [Openais] packet dissectors for totemnet and totemsrp of corosync In-Reply-To: <20090203.205216.547030186854654083.yamato@redhat.com> References: <20090203.205216.547030186854654083.yamato@redhat.com> Message-ID: <1233674969.25292.45.camel@sdake-laptop> Wow. Nice work. Is upstream wireshark likely to accept the patch? regards -steve On Tue, 2009-02-03 at 20:52 +0900, Masatake YAMATO wrote: > I've written wireshark packet dissectors for totemnet and > totemsrp of corosync. > > See the attached png file. > > I have already submitted the patch to wireshark developers. > > https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=3232 > > Currently you have to get the patch from the bugziila and apply it to > the original source code by yourself. I hope the patch is merged in > soon. > > The greatest point is this patch supports decryption: you don't have > to turn off "secauth" before capturing. However, "key" is needed to > decrypt the packet. If you don't specified explicitly in > cluster.conf, the name of cluster is the key. See the bugziila entry > how to give the key to wireshark. > > I'm working on upper layer protocols like totempg, cman, "a", clvmd, > and so on. > > Enjoy! > > > Masatake YAMATO > _______________________________________________ > Openais mailing list > Openais at lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/openais From kadlec at mail.kfki.hu Tue Feb 3 15:44:10 2009 From: kadlec at mail.kfki.hu (Kadlecsik Jozsef) Date: Tue, 3 Feb 2009 16:44:10 +0100 (CET) Subject: [Linux-cluster] Performance degradation after reboot Message-ID: Hello, Due to a major power restructuring we had to shutdown our GFS cluster at Saturday. Since then we have been suffering a serious performance degradation. Previously system load was usually less than 1, in spikes 3-4. Yesterday we had 180(!), without no apparent reason: network interfaces are OK (settings just right, no error/packet loss), no settings modified, usage of the cluster did not change. GFS is over AoE: the Coraid boxes are just fine, no RAID degradation. At starting up, ntpd on some systems could not set the system clock as it was off by more than 180s. We fixed that, rebooted the systems one by one just in case, helped nothing. What is more strange, when the init script issues the command gfs_tool settune /gfs/home statfs_fast 1 it takes quite a lot of time, around 15-20s. What could go wrong, on a nicely working system? Might there be filesystem inconsistencies, which can produce such slowdown and we should run gfs_fsck? The gfs parameters which are tuned: statfs_slots 128 statfs_fast 1 demote_secs 30 glock_purge 50 scand_secs 3 [This one was added today.] Any idea can be useful. Best regards, Jzosef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From marcos.david at efacec.pt Tue Feb 3 18:31:58 2009 From: marcos.david at efacec.pt (Marcos David) Date: Tue, 03 Feb 2009 18:31:58 +0000 Subject: [Linux-cluster] FailOver Domains not working properly Message-ID: <49888D9E.9070003@efacec.pt> Hi, I have a 4 node cluster using RHEL 5.3. I have 4 services which I want to spread trough the servers so I can have some load-balancing. Each server should run one of the services when they are enabled, but what is happening is that the services always start on the node from which I enabled them. My configuration is this: FailOverDomain1 -> includes node1 only, unrestricted, unordered FailOverDomain2 -> includes node2 only, unrestricted, unordered FailOverDomain3 -> includes node3 only, unrestricted, unordered FailOverDomain4 -> includes node4 only, unrestricted, unordered service1 is allocated to FailOverDomain1 service2 is allocated to FailOverDomain2 service3 is allocated to FailOverDomain3 service4 is allocated to FailOverDomain4 when I execute: clusvcadm -e service1 clusvcadm -e service2 clusvcadm -e service3 clusvcadm -e service4 all the services start on the same node (the one where I executed the above commands) Shouldn't they start on the node in their respective failover domain? Am I doing something wrong? Thanks for the help. From robejrm at gmail.com Tue Feb 3 18:40:05 2009 From: robejrm at gmail.com (Juan Ramon Martin Blanco) Date: Tue, 3 Feb 2009 19:40:05 +0100 Subject: [Linux-cluster] FailOver Domains not working properly In-Reply-To: <49888D9E.9070003@efacec.pt> References: <49888D9E.9070003@efacec.pt> Message-ID: <8a5668960902031040n6ee99e61padacd78a1e040c80@mail.gmail.com> On Tue, Feb 3, 2009 at 7:31 PM, Marcos David wrote: > Hi, > I have a 4 node cluster using RHEL 5.3. > > I have 4 services which I want to spread trough the servers so I can > have some load-balancing. > Each server should run one of the services when they are enabled, but > what is happening is that the services always start on the node from > which I enabled them. > > My configuration is this: > > FailOverDomain1 -> includes node1 only, unrestricted, unordered > FailOverDomain2 -> includes node2 only, unrestricted, unordered > FailOverDomain3 -> includes node3 only, unrestricted, unordered > FailOverDomain4 -> includes node4 only, unrestricted, unordered > Hi Marcos, >From the Red Hat documentation: A failover domain is a named subset of cluster members that are eligible to run a cluster service in the event of a system failure. A failover domain can have the following characteristics: - Unrestricted ? Allows you to specify that a subset of members are preferred, but that a cluster service assigned to this domain can run on any available member. - Restricted ? Allows you to restrict the members that can run a particular cluster service. If none of the members in a restricted failover domain are available, the cluster service cannot be started (either manually or by the cluster software). - Unordered ? When a cluster service is assigned to an unordered failover domain, the member on which the cluster service runs is chosen from the available failover domain members with no priority ordering. - Ordered ? Allows you to specify a preference order among the members of a failover domain. The member at the top of the list is the most preferred, followed by the second member in the list, and so on. By default, failover domains are unrestricted and unordered. Configure them as restricted Greetings, Juanra > > > service1 is allocated to FailOverDomain1 > service2 is allocated to FailOverDomain2 > service3 is allocated to FailOverDomain3 > service4 is allocated to FailOverDomain4 > > when I execute: > clusvcadm -e service1 > clusvcadm -e service2 > clusvcadm -e service3 > clusvcadm -e service4 > > all the services start on the same node (the one where I executed the > above commands) > > Shouldn't they start on the node in their respective failover domain? > Am I doing something wrong? > > Thanks for the help. > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apfaffeneder at pfaffeneder.org Tue Feb 3 18:46:05 2009 From: apfaffeneder at pfaffeneder.org (Andreas Pfaffeneder) Date: Tue, 03 Feb 2009 19:46:05 +0100 Subject: [Linux-cluster] FailOver Domains not working properly In-Reply-To: <49888D9E.9070003@efacec.pt> References: <49888D9E.9070003@efacec.pt> Message-ID: <498890ED.90605@pfaffeneder.org> Marcos David wrote: > Hi, > I have a 4 node cluster using RHEL 5.3. > > I have 4 services which I want to spread trough the servers so I can > have some load-balancing. > Each server should run one of the services when they are enabled, but > what is happening is that the services always start on the node from > which I enabled them. > > when I execute: Try running clusvcadm -e -F and also man clusvcadm. Cheers Andreas From marcos.david at efacec.pt Tue Feb 3 18:53:57 2009 From: marcos.david at efacec.pt (Marcos David) Date: Tue, 03 Feb 2009 18:53:57 +0000 Subject: [Linux-cluster] FailOver Domains not working properly In-Reply-To: <8a5668960902031040n6ee99e61padacd78a1e040c80@mail.gmail.com> References: <49888D9E.9070003@efacec.pt> <8a5668960902031040n6ee99e61padacd78a1e040c80@mail.gmail.com> Message-ID: <498892C5.8000701@efacec.pt> Juan Ramon Martin Blanco wrote: > > > On Tue, Feb 3, 2009 at 7:31 PM, Marcos David > wrote: > > Hi, > I have a 4 node cluster using RHEL 5.3. > > I have 4 services which I want to spread trough the servers so I can > have some load-balancing. > Each server should run one of the services when they are enabled, but > what is happening is that the services always start on the node from > which I enabled them. > > My configuration is this: > > FailOverDomain1 -> includes node1 only, unrestricted, unordered > FailOverDomain2 -> includes node2 only, unrestricted, unordered > FailOverDomain3 -> includes node3 only, unrestricted, unordered > FailOverDomain4 -> includes node4 only, unrestricted, unordered > > Hi Marcos, > > From the Red Hat documentation: > > A failover domain is a named subset of cluster members that are > eligible to run a cluster service in the event of a system failure. A > failover domain can have the following characteristics: > > * > > Unrestricted ? Allows you to specify that a subset of members > are preferred, but that a cluster service assigned to this > domain can run on any available member. > > * > > Restricted ? Allows you to restrict the members that can run a > particular cluster service. If none of the members in a > restricted failover domain are available, the cluster service > cannot be started (either manually or by the cluster software). > > * > > Unordered ? When a cluster service is assigned to an unordered > failover domain, the member on which the cluster service runs is > chosen from the available failover domain members with no > priority ordering. > > * > > Ordered ? Allows you to specify a preference order among the > members of a failover domain. The member at the top of the list > is the most preferred, followed by the second member in the > list, and so on. > > By default, failover domains are unrestricted and unordered. > > > Configure them as restricted > > > Greetings, > > Juanra > > > > > > service1 is allocated to FailOverDomain1 > service2 is allocated to FailOverDomain2 > service3 is allocated to FailOverDomain3 > service4 is allocated to FailOverDomain4 > > when I execute: > clusvcadm -e service1 > clusvcadm -e service2 > clusvcadm -e service3 > clusvcadm -e service4 > > all the services start on the same node (the one where I executed the > above commands) > > Shouldn't they start on the node in their respective failover domain? > Am I doing something wrong? > > Thanks for the help. > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Hi, I based my configuration on the redhat documention: Preferred node or preferred member: This is a notion which is no longer present in rgmanager. In older versions, the preferred node was the member designated to run a given service if the member is online. In most cases, it was used with the "Relocate on Preferred Node Boot" service option (as it was generally thought to be useless without!). *In newer rgmanagers, we can emulate this behavior by specifying an unordered, unrestricted failover domain of exactly one member.* There is no equivalent to the "Relocate on Preferred Node Boot" option in Cluster Manager 1.0.x. So it was supposed to work. Or is the restricted option necessary in my case? Greetings, Marcos David -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.david at efacec.pt Tue Feb 3 18:55:36 2009 From: marcos.david at efacec.pt (Marcos David) Date: Tue, 03 Feb 2009 18:55:36 +0000 Subject: [Linux-cluster] FailOver Domains not working properly In-Reply-To: <498890ED.90605@pfaffeneder.org> References: <49888D9E.9070003@efacec.pt> <498890ED.90605@pfaffeneder.org> Message-ID: <49889328.2000609@efacec.pt> Andreas Pfaffeneder wrote: > Marcos David wrote: > >> Hi, >> I have a 4 node cluster using RHEL 5.3. >> >> I have 4 services which I want to spread trough the servers so I can >> have some load-balancing. >> Each server should run one of the services when they are enabled, but >> what is happening is that the services always start on the node from >> which I enabled them. >> >> when I execute: >> > > Try running clusvcadm -e -F > and also > man clusvcadm. > > Cheers > Andreas > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > Hi, But what will be the behaviour when the servers boot? Will the services be assigned to the same node? Or will they automatically relocate to their respective node? Greetings, Marcos David -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.david at efacec.pt Tue Feb 3 19:07:15 2009 From: marcos.david at efacec.pt (Marcos David) Date: Tue, 03 Feb 2009 19:07:15 +0000 Subject: [Linux-cluster] FailOver Domains not working properly In-Reply-To: <498890ED.90605@pfaffeneder.org> References: <49888D9E.9070003@efacec.pt> <498890ED.90605@pfaffeneder.org> Message-ID: <498895E3.30506@efacec.pt> Andreas Pfaffeneder wrote: > Marcos David wrote: > >> Hi, >> I have a 4 node cluster using RHEL 5.3. >> >> I have 4 services which I want to spread trough the servers so I can >> have some load-balancing. >> Each server should run one of the services when they are enabled, but >> what is happening is that the services always start on the node from >> which I enabled them. >> >> when I execute: >> > > Try running clusvcadm -e -F > and also > man clusvcadm. > > Cheers > Andreas > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > Hi, I didn't know about the -F option, it is not listed in the man page. But it works :-) Thanks! Greetings, Marcos David -------------- next part -------------- An HTML attachment was scrubbed... URL: From apfaffeneder at pfaffeneder.org Tue Feb 3 19:11:48 2009 From: apfaffeneder at pfaffeneder.org (Andreas Pfaffeneder) Date: Tue, 03 Feb 2009 20:11:48 +0100 Subject: [Linux-cluster] FailOver Domains not working properly In-Reply-To: <49889328.2000609@efacec.pt> References: <49888D9E.9070003@efacec.pt> <498890ED.90605@pfaffeneder.org> <49889328.2000609@efacec.pt> Message-ID: <498896F4.8060007@pfaffeneder.org> Marcos David wrote: > Andreas Pfaffeneder wrote: >> Marcos David wrote: >> >>> Hi, I have a 4 node cluster using RHEL 5.3. >>> >>> I have 4 services which I want to spread trough the servers so >>> I can have some load-balancing. Each server should run one of >>> the services when they are enabled, but what is happening is >>> that the services always start on the node from which I enabled >>> them. >>> >>> when I execute: >>> >> >> Try running clusvcadm -e -F and also man clusvcadm. >> >> Cheers Andreas > Hi, > > But what will be the behaviour when the servers boot? Will the > services be assigned to the same node? Or will they automatically > relocate to their respective node? > > Greetings, Marcos David Hi Marcos, I think you have to live with the improper balanced resources until you got a maintenance window to redistribute them. Cheers Andreas From apfaffeneder at pfaffeneder.org Tue Feb 3 19:25:01 2009 From: apfaffeneder at pfaffeneder.org (Andreas Pfaffeneder) Date: Tue, 03 Feb 2009 20:25:01 +0100 Subject: [Linux-cluster] FailOver Domains not working properly In-Reply-To: <498895E3.30506@efacec.pt> References: <49888D9E.9070003@efacec.pt> <498890ED.90605@pfaffeneder.org> <498895E3.30506@efacec.pt> Message-ID: <49889A0D.1090806@pfaffeneder.org> Marcos David wrote: >> > Hi, I didn't know about the -F option, it is not listed in the man > page. Then please pardon my "man foobar". You can find the doc here: http://sources.redhat.com/cluster/wiki/FAQ/RGManager#rgm_svcstart Maybe someone at RH should maintenance the man pages. cheers Andreas From yamato at redhat.com Wed Feb 4 04:09:42 2009 From: yamato at redhat.com (Masatake YAMATO) Date: Wed, 04 Feb 2009 13:09:42 +0900 (JST) Subject: [Linux-cluster] Re: [Openais] packet dissectors for totemnet and totemsrp of corosync In-Reply-To: <1233674969.25292.45.camel@sdake-laptop> References: <20090203.205216.547030186854654083.yamato@redhat.com> <1233674969.25292.45.camel@sdake-laptop> Message-ID: <20090204.130942.78666345105965715.yamato@redhat.com> > Wow. > > Nice work. Thank you. If you find better field descriptions, please, let me know. > Is upstream wireshark likely to accept the patch? I have not got any feed back yet. (I submitted this patch yesterday.) Masatake YAMATO From fdinitto at redhat.com Wed Feb 4 07:27:39 2009 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Wed, 04 Feb 2009 08:27:39 +0100 Subject: [Linux-cluster] Re: [Cluster-devel] Cluster IRC meeting - Monday 2nd of Feb 2pm UTC/GMT In-Reply-To: <1232971513.10253.16.camel@cerberus.int.fabbione.net> References: <1232971513.10253.16.camel@cerberus.int.fabbione.net> Message-ID: <1233732459.3835.19.camel@cerberus.int.fabbione.net> On Mon, 2009-01-26 at 13:05 +0100, Fabio M. Di Nitto wrote: > Hi everybody, > > When : Monday 2nd of Feb 2pm UTC/GMT (*)(**) > Where : irc.freenode.net #linux-cluster > Who : everybody interested is invited to participate > Agenda: http://sources.redhat.com/cluster/wiki/Meetings/2009-Feb-02 http://sources.redhat.com/cluster/wiki/Meetings/2009-Feb-02/irclogs IRC logs of the meeting are now available at the above URL. Cheers Fabio From poknam at gmail.com Wed Feb 4 08:25:41 2009 From: poknam at gmail.com (PN) Date: Wed, 4 Feb 2009 16:25:41 +0800 Subject: [Linux-cluster] Some problems on using GFS Message-ID: <92daa7bf0902040025l8a8dc31pda97d40985ca2da1@mail.gmail.com> Dear all, We found some problems on using GFS. We are using RHEL 4U5 with CentOS 4U5 CSGFS RPMs. Given the following PHP program (for testing): The above program only do a simple task: Copy the file "src" to a tmp file, and then rename the tmp file to "dest". where "/data08/test/" is a directory mounted on GFS. When two or more instances of this program running concurrently in *different nodes*, the rename operation was *failed*. But there is *no problem* when two or more instance of this program running concurrently in the *same node*. *Example stderr when running the above program:* PHP Warning: rename(/data08/proga15_test/20090204/tmp_1233720656.1111_mp-bla03_18274,/data08/proga15_test/20090204/dest): No such file or directory in /data08.mount/proga15_test/20090204/t.php on line 14 error PHP Warning: rename(/data08/proga15_test/20090204/tmp_1233720656.9956_mp-bla03_18274,/data08/proga15_test/20090204/dest): No such file or directory in /data08.mount/proga15_test/20090204/t.php on line 14 error PHP Warning: rename(/data08/proga15_test/20090204/tmp_1233720659.7083_mp-bla03_18274,/data08/proga15_test/20090204/dest): No such file or directory in /data08.mount/proga15_test/20090204/t.php on line 14 error PHP Warning: rename(/data08/proga15_test/20090204/tmp_1233720661.1308_mp-bla03_18274,/data08/proga15_test/20090204/dest): No such file or directory in /data08.mount/proga15_test/20090204/t.php on line 14 error PHP Warning: rename(/data08/proga15_test/20090204/tmp_1233720662.5406_mp-bla03_18274,/data08/proga15_test/20090204/dest): No such file or directory in /data08.mount/proga15_test/20090204/t.php on line 14 error PHP Warning: rename(/data08/proga15_test/20090204/tmp_1233720663.9784_mp-bla03_18274,/data08/proga15_test/20090204/dest): No such file or directory in /data08.mount/proga15_test/20090204/t.php on line 14 error PHP Warning: rename(/data08/proga15_test/20090204/tmp_1233720665.5586_mp-bla03_18274,/data08/proga15_test/20090204/dest): No such file or directory in /data08.mount/proga15_test/20090204/t.php on line 14 error Then we've added a while loop before rename. while ( true ) { $tmp = $tmp_prefix . '_' . microtime(true) . '_' . $hostname . '_' . $pid; copy( $input, $tmp ); * while (!file_exists($tmp)) {}* $is_success = rename( $tmp, $dest ); if ( ! $is_success ) { echo 'error' . "\n\n"; } } The file really exists before rename. It does not generate an infinite loop and the same error occurred. Any suggestion or idea? Thanks a lot. Regards, PN -------------- next part -------------- An HTML attachment was scrubbed... URL: From kadlec at mail.kfki.hu Wed Feb 4 10:39:09 2009 From: kadlec at mail.kfki.hu (Kadlecsik Jozsef) Date: Wed, 4 Feb 2009 11:39:09 +0100 (CET) Subject: [Linux-cluster] Performance degradation after reboot In-Reply-To: References: Message-ID: Hello, On Tue, 3 Feb 2009, Kadlecsik Jozsef wrote: > Due to a major power restructuring we had to shutdown our GFS cluster at > Saturday. Since then we have been suffering a serious performance > degradation. Previously system load was usually less than 1, in spikes > 3-4. Yesterday we had 180(!), without no apparent reason: network > interfaces are OK (settings just right, no error/packet loss), no settings > modified, usage of the cluster did not change. GFS is over AoE: the Coraid > boxes are just fine, no RAID degradation. Sorry for the noise: it turned out that one interface did go wrong: instead of 1000Mb/s it worked at 10Mb/s. Sigh. Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From randy.brown at noaa.gov Wed Feb 4 16:00:24 2009 From: randy.brown at noaa.gov (Randy Brown) Date: Wed, 04 Feb 2009 11:00:24 -0500 Subject: [Linux-cluster] Issue with KDE with kernel later than 2.6.18-53.1.21 Message-ID: <4989BB98.2040208@noaa.gov> We are seeing a very strange issue with KDE with all kernels later than 2.6.18-53.1.21 on our cluster nodes. When users attempt to log into the machine using KDE, it never makes it to the KDE splash screen. The machine just hangs on a solid blue screen. I can then restart the Xserver and login using Gnome, and it logs in without any issues. Unfortunately, for us, KDE is the standard desktop environment used by our development group. I can reboot the the cluster to the 2.6.18-53.1.21 kernel and both KDE and Gnome work as expected. Obviously, I'd prefer to be running the latest kernel. I will gladly provide more info if needed. Any help or suggestions would be greatly appreciated. Randy Info about our cluster: It is a 2 node cluster running Centos 5 with up to date patches. I am using the cluster as a NAS head to serve NFS mounts out to our network from an iscsi SAN. Current running kernel: 2.6.18-53.1.21.el5 (Running this kernel so KDE logins work) Most current installed kernel: 2.6.18-92.1.22.el5 Cluster packages: lvm2-cluster-2.02.32-4.el5 cman-2.0.84-2.el5_2.3 gfs2-utils-0.1.44-1.el5_2.1 rgmanager-2.0.38-2.el5_2.1 -------------- next part -------------- A non-text attachment was scrubbed... Name: randy_brown.vcf Type: text/x-vcard Size: 313 bytes Desc: not available URL: From cluster at xinet.it Wed Feb 4 16:57:23 2009 From: cluster at xinet.it (Cluster Management) Date: Wed, 4 Feb 2009 17:57:23 +0100 Subject: [Linux-cluster] clvmd blocked Message-ID: <01a801c986e9$a69f27c0$f3dd7740$@it> Hi all, i have just installed a 3 nodes cluster. I started clvmd and now it is impossibile to stop it (service clvmd stop, killall clvmd etc..). It looks like locked. How can i stop and restarti t without restarting the server? My only configurations are: 1) ISCSI lun connected on machine; 2) manual_fencing; 3) Configured /etc/lvm/lvm.conf with "locking_type=3" option. After a short time (something like 10 minutes) server 1 and server 2 crashed! Regards, -- Francesco -------------- next part -------------- An HTML attachment was scrubbed... URL: From jumanjiman at gmail.com Wed Feb 4 17:07:13 2009 From: jumanjiman at gmail.com (Paul Morgan) Date: Wed, 4 Feb 2009 11:07:13 -0600 Subject: [Linux-cluster] clvmd blocked In-Reply-To: <01a801c986e9$a69f27c0$f3dd7740$@it> References: <01a801c986e9$a69f27c0$f3dd7740$@it> Message-ID: <1b6fc7dd0902040907ya91b83cn265bb64a2b643f96@mail.gmail.com> 2009/2/4 Cluster Management : > i have just installed a 3 nodes cluster. I started clvmd and now it is > impossibile to stop it (service clvmd stop, killall clvmd etc..). It looks > like locked. How can i stop and restarti t without restarting the server? clvmd depends on cman and cluster infrastructure to communicate LVM metadata. You probably need to use fence_tool to leave the fence domain before stopping clvmd. hth, -paul From Danny.Wall at health-first.org Wed Feb 4 15:36:14 2009 From: Danny.Wall at health-first.org (Danny Wall) Date: Wed, 04 Feb 2009 10:36:14 -0500 Subject: [Linux-cluster] Moving clvm filesystem to lvm Message-ID: <49896F9E020000C8000112E3@health-first.org> I have RHEL 4.3 with RHCS and GFS in two nodes. It is currently experiencing problems daily. When I try to ls on the directory, it never finishes. Usually, it takes a few minutes to give the directory listing. If I restart the cluster services, it begins working again. I am in the process of building a new cluster based on RHEl 5, so I would like to move the SAN GFS filesystem to a standalone server to stop the daily problems. Will I have a problem since the filesystem was created for a cluster using clvmd, and I want to move it to LVM2 on a RHEL5 standalone server? My understanding is the clvm does not have any different layout, it just communicates to other nodes, so this should not be a problem. Thanks Danny Wall ##################################### This message is for the named person's use only. It may contain private, proprietary, or legally privileged information. No privilege is waived or lost by any mistransmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it, and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Health First reserves the right to monitor all e-mail communications through its networks. Any views or opinions expressed in this message are solely those of the individual sender, except (1) where the message states such views or opinions are on behalf of a particular entity; and (2) the sender is authorized by the entity to give such views or opinions. ##################################### From corey.kovacs at gmail.com Wed Feb 4 17:37:02 2009 From: corey.kovacs at gmail.com (Corey Kovacs) Date: Wed, 4 Feb 2009 17:37:02 +0000 Subject: [Linux-cluster] Issue with KDE with kernel later than 2.6.18-53.1.21 In-Reply-To: <4989BB98.2040208@noaa.gov> References: <4989BB98.2040208@noaa.gov> Message-ID: <10953D4E-1B1D-4535-980C-97DA3C51A060@gmail.com> I had a similar problem. Reducing my rsize back to 8k from 32k fixed it. Problem root was the lock files in the .qt directory. Regards, Corey On Feb 4, 2009, at 16:00, Randy Brown wrote: > We are seeing a very strange issue with KDE with all kernels later > than 2.6.18-53.1.21 on our cluster nodes. When users attempt to log > into the machine using KDE, it never makes it to the KDE splash > screen. The machine just hangs on a solid blue screen. I can then > restart the Xserver and login using Gnome, and it logs in without > any issues. Unfortunately, for us, KDE is the standard desktop > environment used by our development group. I can reboot the the > cluster to the 2.6.18-53.1.21 kernel and both KDE and Gnome work as > expected. Obviously, I'd prefer to be running the latest kernel. I > will gladly provide more info if needed. Any help or suggestions > would be greatly appreciated. > > Randy > > Info about our cluster: > It is a 2 node cluster running Centos 5 with up to date patches. I > am using the cluster as a NAS head to serve NFS mounts out to our > network from an iscsi SAN. > > Current running kernel: > 2.6.18-53.1.21.el5 (Running this kernel so KDE logins work) > > Most current installed kernel: > 2.6.18-92.1.22.el5 > > Cluster packages: > lvm2-cluster-2.02.32-4.el5 > cman-2.0.84-2.el5_2.3 > gfs2-utils-0.1.44-1.el5_2.1 > rgmanager-2.0.38-2.el5_2.1 > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From randy.brown at noaa.gov Wed Feb 4 18:55:02 2009 From: randy.brown at noaa.gov (Randy Brown) Date: Wed, 04 Feb 2009 13:55:02 -0500 Subject: [Linux-cluster] Issue with KDE with kernel later than 2.6.18-53.1.21 In-Reply-To: <10953D4E-1B1D-4535-980C-97DA3C51A060@gmail.com> References: <4989BB98.2040208@noaa.gov> <10953D4E-1B1D-4535-980C-97DA3C51A060@gmail.com> Message-ID: <4989E486.90308@noaa.gov> Great. Thanks. I'll give that a shot. I just need to find umount and mount that file system to see if it worked. Randy Corey Kovacs wrote: > I had a similar problem. Reducing my rsize back to 8k from 32k fixed > it. Problem root was the lock files in the .qt directory. > > > Regards, > > Corey > > On Feb 4, 2009, at 16:00, Randy Brown wrote: > >> We are seeing a very strange issue with KDE with all kernels later >> than 2.6.18-53.1.21 on our cluster nodes. When users attempt to log >> into the machine using KDE, it never makes it to the KDE splash >> screen. The machine just hangs on a solid blue screen. I can then >> restart the Xserver and login using Gnome, and it logs in without any >> issues. Unfortunately, for us, KDE is the standard desktop >> environment used by our development group. I can reboot the the >> cluster to the 2.6.18-53.1.21 kernel and both KDE and Gnome work as >> expected. Obviously, I'd prefer to be running the latest kernel. I >> will gladly provide more info if needed. Any help or suggestions >> would be greatly appreciated. >> >> Randy >> >> Info about our cluster: >> It is a 2 node cluster running Centos 5 with up to date patches. I >> am using the cluster as a NAS head to serve NFS mounts out to our >> network from an iscsi SAN. >> >> Current running kernel: >> 2.6.18-53.1.21.el5 (Running this kernel so KDE logins work) >> >> Most current installed kernel: >> 2.6.18-92.1.22.el5 >> >> Cluster packages: >> lvm2-cluster-2.02.32-4.el5 >> cman-2.0.84-2.el5_2.3 >> gfs2-utils-0.1.44-1.el5_2.1 >> rgmanager-2.0.38-2.el5_2.1 >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- A non-text attachment was scrubbed... Name: randy_brown.vcf Type: text/x-vcard Size: 313 bytes Desc: not available URL: From randy.brown at noaa.gov Wed Feb 4 20:18:28 2009 From: randy.brown at noaa.gov (Randy Brown) Date: Wed, 04 Feb 2009 15:18:28 -0500 Subject: [Linux-cluster] Issue with KDE with kernel later than 2.6.18-53.1.21 In-Reply-To: <4989E486.90308@noaa.gov> References: <4989BB98.2040208@noaa.gov> <10953D4E-1B1D-4535-980C-97DA3C51A060@gmail.com> <4989E486.90308@noaa.gov> Message-ID: <4989F814.6060105@noaa.gov> I'm having trouble getting the file system to mount after adding the rsize parameter. Here is my fstab entry: /dev/vgdevl/shared /fs/shared gfs defaults 0 0 I changed it to: /dev/vgdevl/shared /fs/shared gfs defaults,rsize=8192 0 0 and when I try to mount the file system I get: [root at nfs2-cluster etc]# mount /fs/shared /sbin/mount.gfs: error mounting /dev/mapper/vgdevl-shared on /fs/shared: Invalid argument Looking though the man pages, it appears that the rsize switch should work. Any thoughts? Thanks, Randy Randy Brown wrote: > Great. Thanks. I'll give that a shot. I just need to find umount > and mount that file system to see if it worked. > > Randy > > Corey Kovacs wrote: >> I had a similar problem. Reducing my rsize back to 8k from 32k fixed >> it. Problem root was the lock files in the .qt directory. >> >> >> Regards, >> >> Corey >> >> On Feb 4, 2009, at 16:00, Randy Brown wrote: >> >>> We are seeing a very strange issue with KDE with all kernels later >>> than 2.6.18-53.1.21 on our cluster nodes. When users attempt to log >>> into the machine using KDE, it never makes it to the KDE splash >>> screen. The machine just hangs on a solid blue screen. I can then >>> restart the Xserver and login using Gnome, and it logs in without >>> any issues. Unfortunately, for us, KDE is the standard desktop >>> environment used by our development group. I can reboot the the >>> cluster to the 2.6.18-53.1.21 kernel and both KDE and Gnome work as >>> expected. Obviously, I'd prefer to be running the latest kernel. I >>> will gladly provide more info if needed. Any help or suggestions >>> would be greatly appreciated. >>> >>> Randy >>> >>> Info about our cluster: >>> It is a 2 node cluster running Centos 5 with up to date patches. I >>> am using the cluster as a NAS head to serve NFS mounts out to our >>> network from an iscsi SAN. >>> >>> Current running kernel: >>> 2.6.18-53.1.21.el5 (Running this kernel so KDE logins work) >>> >>> Most current installed kernel: >>> 2.6.18-92.1.22.el5 >>> >>> Cluster packages: >>> lvm2-cluster-2.02.32-4.el5 >>> cman-2.0.84-2.el5_2.3 >>> gfs2-utils-0.1.44-1.el5_2.1 >>> rgmanager-2.0.38-2.el5_2.1 >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: randy_brown.vcf Type: text/x-vcard Size: 313 bytes Desc: not available URL: From corey.kovacs at gmail.com Wed Feb 4 20:23:56 2009 From: corey.kovacs at gmail.com (Corey Kovacs) Date: Wed, 4 Feb 2009 20:23:56 +0000 Subject: [Linux-cluster] Issue with KDE with kernel later than 2.6.18-53.1.21 In-Reply-To: <4989F814.6060105@noaa.gov> References: <4989BB98.2040208@noaa.gov> <10953D4E-1B1D-4535-980C-97DA3C51A060@gmail.com> <4989E486.90308@noaa.gov> <4989F814.6060105@noaa.gov> Message-ID: <7d6e8da40902041223ub8f0d8ak1f4e5a65135ab6a8@mail.gmail.com> My mistake, I'd assumed (yes I know the implications of that word) that this was on an NFS mount point. the rsize attribute only pertains to NFS mounted filesystems -C 2009/2/4 Randy Brown : > I'm having trouble getting the file system to mount after adding the rsize > parameter. Here is my fstab entry: > > /dev/vgdevl/shared /fs/shared gfs defaults 0 0 > > I changed it to: > > /dev/vgdevl/shared /fs/shared gfs defaults,rsize=8192 > 0 0 > > and when I try to mount the file system I get: > > [root at nfs2-cluster etc]# mount /fs/shared > /sbin/mount.gfs: error mounting /dev/mapper/vgdevl-shared on /fs/shared: > Invalid argument > > Looking though the man pages, it appears that the rsize switch should work. > Any thoughts? > > Thanks, > > Randy > > Randy Brown wrote: >> >> Great. Thanks. I'll give that a shot. I just need to find umount and >> mount that file system to see if it worked. >> >> Randy >> >> Corey Kovacs wrote: >>> >>> I had a similar problem. Reducing my rsize back to 8k from 32k fixed it. >>> Problem root was the lock files in the .qt directory. >>> >>> >>> Regards, >>> >>> Corey >>> >>> On Feb 4, 2009, at 16:00, Randy Brown wrote: >>> >>>> We are seeing a very strange issue with KDE with all kernels later than >>>> 2.6.18-53.1.21 on our cluster nodes. When users attempt to log into the >>>> machine using KDE, it never makes it to the KDE splash screen. The machine >>>> just hangs on a solid blue screen. I can then restart the Xserver and login >>>> using Gnome, and it logs in without any issues. Unfortunately, for us, KDE >>>> is the standard desktop environment used by our development group. I can >>>> reboot the the cluster to the 2.6.18-53.1.21 kernel and both KDE and Gnome >>>> work as expected. Obviously, I'd prefer to be running the latest kernel. I >>>> will gladly provide more info if needed. Any help or suggestions would be >>>> greatly appreciated. >>>> >>>> Randy >>>> >>>> Info about our cluster: >>>> It is a 2 node cluster running Centos 5 with up to date patches. I am >>>> using the cluster as a NAS head to serve NFS mounts out to our network from >>>> an iscsi SAN. >>>> >>>> Current running kernel: >>>> 2.6.18-53.1.21.el5 (Running this kernel so KDE logins work) >>>> >>>> Most current installed kernel: >>>> 2.6.18-92.1.22.el5 >>>> >>>> Cluster packages: >>>> lvm2-cluster-2.02.32-4.el5 >>>> cman-2.0.84-2.el5_2.3 >>>> gfs2-utils-0.1.44-1.el5_2.1 >>>> rgmanager-2.0.38-2.el5_2.1 >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From stewart at epits.com.au Wed Feb 4 23:15:31 2009 From: stewart at epits.com.au (Stewart Walters) Date: Thu, 05 Feb 2009 08:15:31 +0900 Subject: [Linux-cluster] Issue with KDE with kernel later than In-Reply-To: <7d6e8da40902041223ub8f0d8ak1f4e5a65135ab6a8@mail.gmail.com> References: <4989BB98.2040208@noaa.gov> <10953D4E-1B1D-4535-980C-97DA3C51A060@gmail.com> <4989E486.90308@noaa.gov> <4989F814.6060105@noaa.gov> <7d6e8da40902041223ub8f0d8ak1f4e5a65135ab6a8@mail.gmail.com> Message-ID: <498A2193.7090301@epits.com.au> Corey Kovacs wrote: > My mistake, I'd assumed (yes I know the implications of that word) > that this was on an NFS mount point. the rsize attribute only pertains > to NFS mounted filesystems > > -C > > 2009/2/4 Randy Brown : > >> I'm having trouble getting the file system to mount after adding the rsize >> parameter. Here is my fstab entry: >> >> /dev/vgdevl/shared /fs/shared gfs defaults 0 0 >> >> I changed it to: >> >> /dev/vgdevl/shared /fs/shared gfs defaults,rsize=8192 >> 0 0 >> >> and when I try to mount the file system I get: >> >> [root at nfs2-cluster etc]# mount /fs/shared >> /sbin/mount.gfs: error mounting /dev/mapper/vgdevl-shared on /fs/shared: >> Invalid argument >> >> Looking though the man pages, it appears that the rsize switch should work. >> Any thoughts? >> >> Thanks, >> >> Randy >> >> Randy Brown wrote: >> >>> Great. Thanks. I'll give that a shot. I just need to find umount and >>> mount that file system to see if it worked. >>> >>> Randy >>> >>> Corey Kovacs wrote: >>> >>>> I had a similar problem. Reducing my rsize back to 8k from 32k fixed it. >>>> Problem root was the lock files in the .qt directory. >>>> >>>> >>>> Regards, >>>> >>>> Corey >>>> >>>> On Feb 4, 2009, at 16:00, Randy Brown wrote: >>>> >>>> >>>>> We are seeing a very strange issue with KDE with all kernels later than >>>>> 2.6.18-53.1.21 on our cluster nodes. When users attempt to log into the >>>>> machine using KDE, it never makes it to the KDE splash screen. The machine >>>>> just hangs on a solid blue screen. I can then restart the Xserver and login >>>>> using Gnome, and it logs in without any issues. Unfortunately, for us, KDE >>>>> is the standard desktop environment used by our development group. I can >>>>> reboot the the cluster to the 2.6.18-53.1.21 kernel and both KDE and Gnome >>>>> work as expected. Obviously, I'd prefer to be running the latest kernel. I >>>>> will gladly provide more info if needed. Any help or suggestions would be >>>>> greatly appreciated. >>>>> >>>>> Randy >>>>> >>>>> Info about our cluster: >>>>> It is a 2 node cluster running Centos 5 with up to date patches. I am >>>>> using the cluster as a NAS head to serve NFS mounts out to our network from >>>>> an iscsi SAN. >>>>> >>>>> Current running kernel: >>>>> 2.6.18-53.1.21.el5 (Running this kernel so KDE logins work) >>>>> >>>>> Most current installed kernel: >>>>> 2.6.18-92.1.22.el5 >>>>> >>>>> Cluster packages: >>>>> lvm2-cluster-2.02.32-4.el5 >>>>> cman-2.0.84-2.el5_2.3 >>>>> gfs2-utils-0.1.44-1.el5_2.1 >>>>> rgmanager-2.0.38-2.el5_2.1 >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Although your probably on the right track to the problem. KDE has known problems with NFS, because NFS cannot guarantee that a lock is granted. KDE requires that locks will to be granted on a user's /home directory otherwise you will get exactly what you are experiencing (KDE only get's through 3 of the 5 steps during logon, then hangs at a blue screen). Do a google search for "KDE on NFS" or something like that, and you'll see that it's a fairly common problem. The KDE developers themselves state that you should never use KDE when your home directories are on NFS, but there are ways around it. Home directories on NFS is fairly common in large linux/unix deployments. If your experiencing this problem when your user home directories sit on GFS I would investigate GFS locking first up. There's probably something there that KDE disagrees with. Regards, Stewart From ccaulfie at redhat.com Thu Feb 5 08:41:49 2009 From: ccaulfie at redhat.com (Christine Caulfield) Date: Thu, 5 Feb 2009 03:41:49 -0500 (EST) Subject: [Linux-cluster] Moving clvm filesystem to lvm In-Reply-To: <49896F9E020000C8000112E3@health-first.org> Message-ID: <1611040022.4431331233823309414.JavaMail.root@zmail01.collab.prod.int.phx2.redhat.com> ----- "Danny Wall" wrote: > I have RHEL 4.3 with RHCS and GFS in two nodes. It is currently > experiencing problems daily. When I try to ls on the directory, it > never > finishes. Usually, it takes a few minutes to give the directory > listing. > If I restart the cluster services, it begins working again. > > I am in the process of building a new cluster based on RHEl 5, so I > would like to move the SAN GFS filesystem to a standalone server to > stop > the daily problems. Will I have a problem since the filesystem was > created for a cluster using clvmd, and I want to move it to LVM2 on a > RHEL5 standalone server? My understanding is the clvm does not have > any > different layout, it just communicates to other nodes, so this should > not be a problem. Yes, CLVM and LVM are exactly the same thing. You will need to mark the VG as non-clustered BEFORE removing it from clvmd control though :-) -- Chrissie From gianluca.cecchi at gmail.com Thu Feb 5 09:27:10 2009 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Thu, 5 Feb 2009 10:27:10 +0100 Subject: [Linux-cluster] Moving clvm filesystem to lvm Message-ID: <561c252c0902050127v3b7c63dq416cfe57b2eec56a@mail.gmail.com> On Wed, 04 Feb 2009 10:36:14 -0500 Danny Wall wrote: > Will I have a problem since the filesystem was > created for a cluster using clvmd, and I want to move it to LVM2 on a > RHEL5 standalone server? Take in mind that on the standalone node, if it has not all the clustersuite infrastructure (lvm2-cluster and its dependencies such as rgmanager, cman, ecc.) you have to modify the file /etc/lvm/lvm.conf specifying locking_type = 0 then vgchange -cn YOUR_VG reset lvm.conf to its default value vgchange -ay YOUR_VG See also knowledge base http://kbase.redhat.com/faq/docs/DOC-3619 HIH, Gianluca From gianluca.cecchi at gmail.com Thu Feb 5 09:30:07 2009 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Thu, 5 Feb 2009 10:30:07 +0100 Subject: [Linux-cluster] interdependency between different services possible? Message-ID: <561c252c0902050130k597c824frfe95a45e87cc29b7@mail.gmail.com> Can I have parent-child relations between different services of a cluster? Can I have placement policies relations between different services? It seems this is not covered inside the manuals, at least for rh el 5.2 An example could be if I have: service 1 with several fs resources on it and a virtual ip service 2 with an application insisting on the file systems and ip defined in service1, so that I want service 2 to start on the same node of service 1 or not at all I would like to keep the app resource of service 2 separate form the ones in service 1, for example to be able to stop the app but keeping the shared file systems and ip still accessible (if the app is a db I could do a backup of its files disabling service 2, but it applies also for maintenance operations) In this way I could run a clusvcadm -d service2 and make what I want on the still mounted filesystems provided by service1, accessing them with the cluster ip from a client Any hints on making this or something similar? Thanks, Gianluca From fdinitto at redhat.com Thu Feb 5 10:09:32 2009 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Thu, 05 Feb 2009 11:09:32 +0100 Subject: [Linux-cluster] [ANNOUNCE] Changes in the STABLE3 release procedure Message-ID: <1233828572.2659.31.camel@cerberus.int.fabbione.net> Hi everybody, for sometime now, the team has been considering the option to split the cluster 3.0.0 tarball into different pieces. The main reason driving this requirement is that both fence agents and resource agents receive a lot more bug fixes than the normal code. Updating the whole code, when only a script is updated, it's overkilling for the users. In order to reduce the amount of updates, we had to change the way in which we release from upstream. For the past couple of cluster-3.0.0.alpha releases, we introduced an helper script that generates two more (optional) tarballs. So this is how it works now. At release time, the automated release script will create 3 tarballs: - cluster-$version.tar.gz - resource-agents-$version.tar.gz - fence-agents-$version.tar.gz cluster-$version.tar.gz is _unchanged_ and it contains everything exactly like before. resource-agents-$version.tar.gz contains only the rgmanager resource agents. fence-agents-$version.tar.gz contains only fence agents. *If you package or build the whole stack from cluster-$version.tar.gz you do _NOT_ need resource-agents or fence-agents tarballs.* Building resource-agents and fence-agents tarballs is a bit more rough than building the whole stack as they inherit the whole build system from cluster. This will slowly change in time. If you are packager or a user and you want to switch to a "split" release, those are our recommendations: - configure/build cluster-$version.tar.gz with --without_fence_agents --without_resource_agents - package/install cluster-$version - configure fence-agents with normal options as you always did. fence-agents _requires_ cluster to be installed to build fence_xvm. - build fence-agents using: make -C fence/agents - package/install fence-agents using: make -C fence/agents install make -C fence/man install - configure resource-agents with normal options as you always did. - build resource-agents using: make -C rgmanager/src/resources - install resource-agents using: make -C rgmanager/src/resources - package/install resource-agents Make sure to keep the configure options (specially paths) consistent across the 3 tarballs or things will not work. At this point, you can simply update only fence-agents or resource-agents or only cluster depending on what has changed upstream. A lot of distribution ship rgmanager as standalone package/daemon. If you do, rgmanager will require/depend resource-agents to run. So make sure to express this in your favourite packaging system. If you have any doubt, please don't hesitate to ask or stay on the same track as before by using only cluster-$version.tar.gz. Fabio From lpleiman at redhat.com Thu Feb 5 14:29:51 2009 From: lpleiman at redhat.com (Leo Pleiman) Date: Thu, 5 Feb 2009 09:29:51 -0500 (EST) Subject: [Linux-cluster] GFS Mount Problem In-Reply-To: <1522715510.188231233844122140.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Message-ID: <1502175078.188291233844191856.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> I have a 7 node virtualized RHEL5.3 cluster. All nodes mount mulitple gfs filesystems. One node only only of the file systems won't mount. I know (from a previous incident) that rebooting the node will clear the problem. Below is some information, I'd appreciate any help to resolve this without rebooting the node. [root at rdccluster2-5 ~]# umount /decennial/ umount: /decennial/: not mounted [root at rdccluster2-5 ~]# mount /decennial /sbin/mount.gfs: fs is being unmounted /sbin/mount.gfs: error mounting lockproto lock_dlm [root at rdccluster2-5 ~]# strace /decennial ...snip lstat("/etc/mtab", {st_mode=S_IFREG|0644, st_size=1817, ...}) = 0 readlink("/decennial", 0x7fff43151700, 4096) = -1 EINVAL (Invalid argument) umask(077) = 022 open("/etc/fstab", O_RDONLY) = 3 umask(022) = 077 fstat(3, {st_mode=S_IFREG|0644, st_size=1695, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ab read(3, "/dev/vgrp00/Root / "..., 4096) = 1695 read(3, "", 4096) = 0 close(3) = 0 munmap(0x2abd6af59000, 4096) = 0 stat("/sbin/mount.gfs", {st_mode=S_IFREG|0755, st_size=41968, ...}) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, chil wait4(-1, /sbin/mount.gfs: fs is being unmounted /sbin/mount.gfs: error mounting lockproto lock_dlm [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 22646 --- SIGCHLD (Child exited) @ 0 (0) --- exit_group(1) = ? [root at rdccluster2-5 ~]# >From /var/log/messages Feb 5 09:18:39 rdccluster2-5 gfs_controld[2862]: mount: reject mount due to unmount Feb 5 09:19:00 rdccluster2-5 gfs_controld[2862]: mount: reject mount due to unmount --Leo From jumanjiman at gmail.com Thu Feb 5 15:15:57 2009 From: jumanjiman at gmail.com (jumanjiman at gmail.com) Date: Thu, 5 Feb 2009 15:15:57 +0000 Subject: [Linux-cluster] interdependency between different services possible? In-Reply-To: <561c252c0902050130k597c824frfe95a45e87cc29b7@mail.gmail.com> References: <561c252c0902050130k597c824frfe95a45e87cc29b7@mail.gmail.com> Message-ID: <1771741017-1233846955-cardhu_decombobulator_blackberry.rim.net-1092632408-@bxe144.bisx.prod.on.blackberry> A guiding principle for cluser suite is that an ha service (resource group) should be able to fail over w/o impacting any other RG. To accomplish what you want, simply put your init scripts in a single RG. For a complex scenario, you will probably need to write a custom script that combines the individual init scripts. And: consider whether your goal is really the best approach. It may be, but then again it may have the ultimate impact of reducing availability. This reduction is a byproduct of failure outcomes. Hth, -paul Sent via BlackBerry by AT&T -----Original Message----- From: Gianluca Cecchi Date: Thu, 5 Feb 2009 10:30:07 To: Subject: [Linux-cluster] interdependency between different services possible? Can I have parent-child relations between different services of a cluster? Can I have placement policies relations between different services? It seems this is not covered inside the manuals, at least for rh el 5.2 An example could be if I have: service 1 with several fs resources on it and a virtual ip service 2 with an application insisting on the file systems and ip defined in service1, so that I want service 2 to start on the same node of service 1 or not at all I would like to keep the app resource of service 2 separate form the ones in service 1, for example to be able to stop the app but keeping the shared file systems and ip still accessible (if the app is a db I could do a backup of its files disabling service 2, but it applies also for maintenance operations) In this way I could run a clusvcadm -d service2 and make what I want on the still mounted filesystems provided by service1, accessing them with the cluster ip from a client Any hints on making this or something similar? Thanks, Gianluca -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From teigland at redhat.com Thu Feb 5 16:07:27 2009 From: teigland at redhat.com (David Teigland) Date: Thu, 5 Feb 2009 10:07:27 -0600 Subject: [Linux-cluster] GFS Mount Problem In-Reply-To: <1502175078.188291233844191856.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> References: <1522715510.188231233844122140.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> <1502175078.188291233844191856.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Message-ID: <20090205160727.GA2029@redhat.com> On Thu, Feb 05, 2009 at 09:29:51AM -0500, Leo Pleiman wrote: > I have a 7 node virtualized RHEL5.3 cluster. All nodes mount mulitple gfs > filesystems. One node only only of the file systems won't mount. I know > (from a previous incident) that rebooting the node will clear the problem. > Below is some information, I'd appreciate any help to resolve this without > rebooting the node. > > [root at rdccluster2-5 ~]# umount /decennial/ > umount: /decennial/: not mounted > [root at rdccluster2-5 ~]# mount /decennial > /sbin/mount.gfs: fs is being unmounted > /sbin/mount.gfs: error mounting lockproto lock_dlm Seems something went wrong with an unmount; gfs_controld thinks the unmount is still in progress. Does 'ps' show any unmount still running? Does /proc/mounts show the fs still mounted in the kernel? What does group_tool -v show about the fs's? If the kernel portion of the unmount is done, and you can unmount any other gfs fs's, then you should be able to kill gfs_controld, stop the cluster on this node and restart it. Dave From lpleiman at redhat.com Thu Feb 5 16:43:22 2009 From: lpleiman at redhat.com (Leo Pleiman) Date: Thu, 5 Feb 2009 11:43:22 -0500 (EST) Subject: [Linux-cluster] GFS Mount Problem In-Reply-To: <628584858.191371233851968251.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Message-ID: <412007529.191441233852202050.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> > On Thu, Feb 05, 2009 at 09:29:51AM -0500, Leo Pleiman wrote: > > I have a 7 node virtualized RHEL5.3 cluster. All nodes mount > mulitple gfs > > filesystems. One node only only of the file systems won't mount. I > know > > (from a previous incident) that rebooting the node will clear the > problem. > > Below is some information, I'd appreciate any help to resolve this > without > > rebooting the node. > > > > [root at rdccluster2-5 ~]# umount /decennial/ > > umount: /decennial/: not mounted > > [root at rdccluster2-5 ~]# mount /decennial > > /sbin/mount.gfs: fs is being unmounted > > /sbin/mount.gfs: error mounting lockproto lock_dlm > > Seems something went wrong with an unmount; gfs_controld thinks the > unmount is > still in progress. Does 'ps' show any unmount still running? Does > /proc/mounts show the fs still mounted in the kernel? What does > group_tool -v > show about the fs's? If the kernel portion of the unmount is done, > and you can > unmount any other gfs fs's, then you should be able to kill > gfs_controld, stop > the cluster on this node and restart it. > > Dave The other gfs fs are in use by the customer, the idea is to restore the mount without interruption, hence I didn't want to reboot the node. /proc/mounts shows the fs is unmounted group_tool -v shows type level name id state node id local_done fence 0 default 00010007 none [1 2 3 4 5 6 7] dlm 1 clvmd 00010006 none [1 2 3 4 5 6 7] dlm 1 apps1 00100005 none [1 2 3 4 5 6 7] dlm 1 Home 00120005 none [1 2 3 4 5 6 7] dlm 1 Demographic 00140005 none [1 2 3 4 5 6 7] dlm 1 Economic 00160005 none [1 2 3 4 5 6 7] dlm 1 mixed 00180005 none [1 2 3 4 5 6 7] dlm 1 Rdcstaff 00190005 none [1 2 3 4 5 6 7] dlm 1 Lbddmz 001b0005 none [1 2 3 4 5 6 7] dlm 1 washdc_projects1 001c0005 none [1 2 3 4 5 6 7] dlm 1 Baruch 001e0005 none [1 2 3 4 5 6 7] dlm 1 Berkley 00200005 none [1 2 3 4 5 6 7] dlm 1 Boston 00220005 none [1 2 3 4 5 6 7] dlm 1 Chicago 00230005 none [1 3 4 5 6 7] dlm 1 Cornell 00130002 none [1 2 3 4 5 6 7] dlm 1 mich_projects 00150002 none [1 2 3 4 5 6 7] dlm 1 tri_projects 00170002 none [1 2 3 4 5 6 7] dlm 1 rdcfed1 00240005 none [1 2 3 4 5 7] dlm 1 rgmanager 00190002 none [1 2 3 4 5 6 7] gfs 2 Decennial 00020005 none [1 2 3 4 5 6 7] gfs 2 public 00060005 none [1 2 3 4 5 6 7] gfs 2 ucla_projects1 000e0005 none [1 2 3 4 5 6 7] gfs 2 apps1 000f0005 none [1 2 3 4 5 6 7] gfs 2 Home 00110005 none [1 2 3 4 5 6 7] gfs 2 Demographic 00130005 none [1 2 3 4 5 6 7] gfs 2 Economic 00150005 none [1 2 3 4 5 6 7] gfs 2 mixed 00170005 none [1 2 3 4 5 6 7] gfs 2 Rdcstaff 00070001 none [1 2 3 4 5 6 7] gfs 2 Lbddmz 001a0005 none [1 2 3 4 5 6 7] gfs 2 washdc_projects1 00090001 none [1 2 3 4 5 6 7] gfs 2 Baruch 001d0005 none [1 2 3 4 5 6 7] gfs 2 Berkley 001f0005 none [1 2 3 4 5 6 7] gfs 2 Boston 00210005 none [1 2 3 4 5 6 7] gfs 2 Chicago 000d0002 none [1 2 3 4 5 6 7] gfs 2 Cornell 00120002 none [1 2 3 4 5 6 7] gfs 2 mich_projects 00140002 none [1 2 3 4 5 6 7] gfs 2 tri_projects 00160002 none [1 2 3 4 5 6 7] gfs 2 rdcfed1 00140006 none [1 2 3 4 5 6 7] --Leo From teigland at redhat.com Thu Feb 5 16:58:09 2009 From: teigland at redhat.com (David Teigland) Date: Thu, 5 Feb 2009 10:58:09 -0600 Subject: [Linux-cluster] GFS Mount Problem In-Reply-To: <412007529.191441233852202050.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> References: <628584858.191371233851968251.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> <412007529.191441233852202050.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Message-ID: <20090205165808.GB2029@redhat.com> On Thu, Feb 05, 2009 at 11:43:22AM -0500, Leo Pleiman wrote: > The other gfs fs are in use by the customer, the idea is to restore the > mount without interruption, hence I didn't want to reboot the node. > > /proc/mounts shows the fs is unmounted You could try, "umount.gfs -v -X /decennial" which skips the kernel unmount (which is done) and just tells gfs_controld to remove its record for the fs. If that doesn't work, then I can't think of any other options but restarting the cluster on that node (shouldn't require a full machine reboot). Dave From lpleiman at redhat.com Thu Feb 5 17:23:21 2009 From: lpleiman at redhat.com (Leo Pleiman) Date: Thu, 5 Feb 2009 12:23:21 -0500 (EST) Subject: [Linux-cluster] GFS Mount Problem In-Reply-To: <20090205165808.GB2029@redhat.com> Message-ID: <1476393982.192391233854601228.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> ----- "David Teigland" wrote: > On Thu, Feb 05, 2009 at 11:43:22AM -0500, Leo Pleiman wrote: > > > The other gfs fs are in use by the customer, the idea is to restore > the > > mount without interruption, hence I didn't want to reboot the node. > > > > /proc/mounts shows the fs is unmounted > > You could try, "umount.gfs -v -X /decennial" which skips the kernel > unmount > (which is done) and just tells gfs_controld to remove its record for > the fs. > If that doesn't work, then I can't think of any other options but > restarting > the cluster on that node (shouldn't require a full machine reboot). > > Dave Sorry to be a pest, I've looked at man pages, how would I tell gfs_controld to remove its record for the fs? --Leo From teigland at redhat.com Thu Feb 5 17:25:39 2009 From: teigland at redhat.com (David Teigland) Date: Thu, 5 Feb 2009 11:25:39 -0600 Subject: [Linux-cluster] GFS Mount Problem In-Reply-To: <1476393982.192391233854601228.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> References: <20090205165808.GB2029@redhat.com> <1476393982.192391233854601228.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Message-ID: <20090205172539.GC2029@redhat.com> On Thu, Feb 05, 2009 at 12:23:21PM -0500, Leo Pleiman wrote: > > ----- "David Teigland" wrote: > > > On Thu, Feb 05, 2009 at 11:43:22AM -0500, Leo Pleiman wrote: > > > > > The other gfs fs are in use by the customer, the idea is to restore > > the > > > mount without interruption, hence I didn't want to reboot the node. > > > > > > /proc/mounts shows the fs is unmounted > > > > You could try, "umount.gfs -v -X /decennial" which skips the kernel > > unmount > > (which is done) and just tells gfs_controld to remove its record for > > the fs. > > If that doesn't work, then I can't think of any other options but > > restarting > > the cluster on that node (shouldn't require a full machine reboot). > > > > Dave > > Sorry to be a pest, I've looked at man pages, how would I tell gfs_controld > to remove its record for the fs? The command "umount.gfs -v -X /decennial" From rajpurush at gmail.com Thu Feb 5 18:40:34 2009 From: rajpurush at gmail.com (Rajeev P) Date: Fri, 6 Feb 2009 00:10:34 +0530 Subject: [Linux-cluster] SCSI PR fencing Message-ID: <7a271b290902051040n1306b8f8qc6cdd6e2be5d3b62@mail.gmail.com> Hi, Consider a two node RHEL5 cluster that is configured with SCSI PR fencing but without a qdisk. In the event of network split, nodes will attempt to fence the other one. One of the node's (node that lost the race) key will be successfully removed from the disk and key of the node that won race will be present on the disk. The question I have is that, in the event of a network split, is there a possibility (even remote) for both the nodes to "successfully" fence each other? That is key of both the nodes are successfully removed from the disk. I am trying draw comparision with HP iLO fencing mechanism, where there is possibily for the nodes to fence each and hence bringing down the entire cluster. Thanks, Rajeev Purushottam -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohara at redhat.com Fri Feb 6 01:24:50 2009 From: rohara at redhat.com (Ryan O'Hara) Date: Thu, 5 Feb 2009 19:24:50 -0600 Subject: [Linux-cluster] SCSI PR fencing In-Reply-To: <7a271b290902051040n1306b8f8qc6cdd6e2be5d3b62@mail.gmail.com> References: <7a271b290902051040n1306b8f8qc6cdd6e2be5d3b62@mail.gmail.com> Message-ID: <20090206012450.GA7696@redhat.com> On Fri, Feb 06, 2009 at 12:10:34AM +0530, Rajeev P wrote: > Hi, > > Consider a two node RHEL5 cluster that is configured with SCSI PR fencing > but without a qdisk. In the event of network split, nodes will attempt to > fence the other one. One of the node's (node that lost the race) key will be > successfully removed from the disk and key of the node that won race will be > present on the disk. This is correct. > The question I have is that, in the event of a network split, is there a > possibility (even remote) for both the nodes to "successfully" fence each > other? That is key of both the nodes are successfully removed from the > disk. It should not be possible for both nodes to remove the other node's key. The first "unregister" command to hit the scsi device will determine the winner. The reason this works is due to the fact that a node must be registered with a device in order to remove keys. Ryan From mockey.chen at nsn.com Fri Feb 6 08:17:30 2009 From: mockey.chen at nsn.com (Mockey Chen) Date: Fri, 06 Feb 2009 16:17:30 +0800 Subject: [Linux-cluster] How to avoid two-nodes cluster split-brain without shared storage Message-ID: <498BF21A.8040005@nsn.com> Hi, I have a two node cluster, due to design limitation, there is no shared storage exist. So I can not use quorum disk. Is there any other reliable way to prevent split-brain? I searched google, but I can not found any solution suitable for my situation. Any comment is appreciated. Best Regards. Chen Ming From gianluca.cecchi at gmail.com Fri Feb 6 11:41:12 2009 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Fri, 6 Feb 2009 12:41:12 +0100 Subject: [Linux-cluster] interdependency between different services possible? Message-ID: <561c252c0902060341m5bff0397o13ae4e765b156c5c@mail.gmail.com> On Thu, 5 Feb 2009 15:15:57 +0000 jumanjiman wrote: > A guiding principle for cluser suite is that an ha service (resource group) should be able to fail over w/o impacting any other RG. Ok, it does make sense. I would like to dig to dependencies a little more, to understand better the degree of flexibility and the correct approach. I take for example a DB as application, but I think the scenario could be similar for other apps. Tipically if the DB is Oracle you would spread datafiles and other structures on several different filesystems. Based on above considerations, if I have a 2-nodes cluster where I want to put 2 oracle instances on HA I should make something like service1 with resources ip1, fs1, fs2, fs3, orasid1 service2 with resources ip2, fs4, fs5, fs6, orasid2 This is the maximum flexibility so that at run-time I can get the two instances on different nodes and leverage the overall power available, keeping one node strong enough to be able to sustain the load of both the instances in case of failure/maintenance. But it could be a problem if I have not so many different LUNs available on the SAN for example. Or if I have problems to bind the two instances to different ip addresses. Some questions: 1) In this scenario is it correct to say that orasid1, being a script, is started as the latest resource, but if for example fs3 fails to mount, the RG attempts anyway to start the orasid1 instance, because there is not an explicit dependency upon other resources? Or is the script attempted to be run only if all the other fs resources have started ok, by default? If not so, I can put a dependency, but can it be on all the three filesystems reosurces and not only one of them? I see something such as this into the docs From spods at iinet.net.au Fri Feb 13 05:06:29 2009 From: spods at iinet.net.au (spods at iinet.net.au) Date: Fri, 13 Feb 2009 14:06:29 +0900 Subject: [Linux-cluster] Alternate NIC for clustering Message-ID: <55258.1234501589@iinet.net.au> Sorry if this reply is a repeat. I sent a reply to this post when my mail client was foolishly not set for plaintext emails only. As far as I can tell the list manager has stripped it, but it could just be slow. Apologies again if that's the case. --------------------------------------------------------- Hi, Your issue is likely not to be cluster related, but a problem with routing or name resolution. Please send the outputs of the following commands to me privately (not replied to the list) so I can assist: cat /etc/nsswitch.conf netstat -rn ip rule show ip route show cat /etc/hosts cat /etc/resolv.conf nslookup 172.38.1.17 nslookup 172.38.1.18 nslookup 192.168.190.86 nslookup 192.168.190.87 nslookup ricci1b.gallup.com nslookup ricci2b.gallup.com nslookup ricci1b-ic.gallup.com nslookup ricci2b-ic.gallup.com ifconfig -a Any configs that relate to how eth1 and eth2 are bonded. Also, not that I think it makes a difference, but 172.38.0.0 is *not* a part of the IANA reserved address space for private use (if that's what you were intending). For further info, please read section 3 of RFC 1918 at http://www.isi.edu/in-notes/rfc1918.txt. I doubt this would relate to your problem, however. Regards, Stewart ----------------------------------------------------- [Linux-cluster] Alternate NIC for clustering * From: "Hunt, Gary" * To: "Linux-cluster redhat com" * Cc: * Subject: [Linux-cluster] Alternate NIC for clustering * Date: Thu, 12 Feb 2009 13:26:17 -0600 Does anyone know how to get the cluster to use an alternate private network for clustering? I have 2 nodes RHEL 5.3 Eth0 is the public interface Eth1 and eth2 are bonded I added entries for the bonded interface in both hosts file. Here is my hosts file 192.168.190.86 ricci1b ricci1b.gallup.com 192.168.190.87 ricci2b ricci2b.gallup.com 172.38.1.17 ricci1b-ic ricci1b-ic.gallup.com 172.38.1.18 ricci2b-ic ricci2b-ic.gallup.com Here is my cluster.conf I get this error in my log file Feb 12 11:34:27 ricci1b openais[6353]: [MAIN ] local node name "ricci1b.gallup.com" not found in cluster.conf Thanks Gary From kein.he at gmail.com Fri Feb 13 05:21:06 2009 From: kein.he at gmail.com (Kein He) Date: Fri, 13 Feb 2009 13:21:06 +0800 Subject: [Linux-cluster] How to clean up conga database ? Message-ID: <49950342.60606@gmail.com> Hi guys, Is there anyway to clean up conga database ( back to state that before "luci_admin init" )? I searched the official site, but i didn't find anything useful. Thanks! From esggrupos at gmail.com Fri Feb 13 08:23:36 2009 From: esggrupos at gmail.com (ESGLinux) Date: Fri, 13 Feb 2009 09:23:36 +0100 Subject: [Linux-cluster] why all services stops when a node reboots? In-Reply-To: <5f39cb8e0902121138x3c243fd8j25d3933c970df088@mail.gmail.com> References: <3128ba140902120017r3f9a8722t239c48e3e0a7bd9a@mail.gmail.com> <5f39cb8e0902121019u4b2d1307j5dd0f23fcf3f5506@mail.gmail.com> <3128ba140902121123r60ef584cxab5574f7abecb9e6@mail.gmail.com> <5f39cb8e0902121138x3c243fd8j25d3933c970df088@mail.gmail.com> Message-ID: <3128ba140902130023k663372f6qa83e49fb3df98799@mail.gmail.com> Hello, The services run ok on node1. If I halt node2 and try to run the services the run ok on node1. If I run the services without cluster they also run ok. I have eliminated the HTTP services and I have left the service BBDD to debug the problem. Here is the log when the service is running on node2 and node1 comes up: Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering GATHER state from 11. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Creating commit token because I am the rep. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Saving state aru 1a high seq receiv ed 1a Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Storing new sequence id for ring 17 f4 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering COMMIT state. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering RECOVERY state. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [0] member 192.168.1.185: Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep 192.168. 1.185 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 1a high delivered 1a received f lag 1 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [1] member 192.168.1.188: Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep 192.168. 1.188 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 9 high delivered 9 received fla g 1 Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Did not need to originate any messa ges in recovery. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Sending initial ORF token Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) Feb 13 09:16:00 NODE2 openais[3326]: [SYNC ] This node is within the primary component and will provide service. Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering OPERATIONAL state. Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message 192.168.1.185 Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message 192.168.1.188 Feb 13 09:16:00 NODE2 openais[3326]: [CPG ] got joinlist message from node 2 Feb 13 09:16:03 NODE2 kernel: dlm: connecting to 1 Feb 13 09:16:24 NODE2 clurgmgrd[4001]: Relocating service:BBDD to better node node1 Feb 13 09:16:24 NODE2 clurgmgrd[4001]: Stopping service service:BBDD Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: Stopping Service mysql:mydb > Failed - Application Is Still Running Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: Stopping Service mysql:mydb > Failed Feb 13 09:16:25 NODE2 clurgmgrd[4001]: stop on mysql "mydb" returned 1 (generic error) Feb 13 09:16:25 NODE2 avahi-daemon[3872]: Withdrawing address record for 192.168.1.183 on eth0. Feb 13 09:16:35 NODE2 clurgmgrd[4001]: #12: RG service:BBDD failed to stop; intervention required Feb 13 09:16:35 NODE2 clurgmgrd[4001]: Service service:BBDD is failed Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #70: Failed to relocate service:BBDD; restarting locally Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #43: Service service:BBDD has failed; can not start. Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #2: Service service:BBDD returned failure code. Last Owner: node2 Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #4: Administrator intervention required. As you can see in the message "Relocating service:BBDD to better node node1" But it fails Another error that appears frecuently in my logs is the next: Checking Existence Of File /var/run/cluster/mysql/mysql:mydb.pid [mysql:mydb] > Failed - File Doesn't Exist I dont know if this is important. but I think this makes the message err> Stopping Service mysql:mydb > Failed - Application Is Still Running and this makes the service fails (I?m just guessing...) Any idea? ESG 2009/2/12 rajveer singh > Hi, > > Ok, perhaps there is some problem with the services on node1 , so, are you > able to run these services on node1 without cluster. You first stop the > cluster, and try to run these services on node1. > > It should run. > > Re, > Rajveer Singh > > 2009/2/13 ESGLinux > > Hello, >> >> Thats what I want, when node1 comes up I want to relocate to node1 but >> what I get is all my services stoped and in failed state. >> >> With my configuration I expect to have the services running on node1. >> >> Any idea about this behaviour? >> >> Thanks >> >> ESG >> >> >> 2009/2/12 rajveer singh >> >> >>> >>> 2009/2/12 ESGLinux >>> >>>> Hello all, >>>> >>>> I?m testing a cluster using luci as admin tool. I have configured 2 >>>> nodes with 2 services http + mysql. This configuration works almost fine. I >>>> have the services running on the node1 >>>> and y reboot this node1. Then the services relocates to node2 and all >>>> contnues working but, when the node1 goes up all the services stops. >>>> >>>> I think that the node1, when comes alive, tries to run the services and >>>> that makes the services stops, can it be true? I think node1 should not >>>> start anything because the services are running in node2. >>>> >>>> Perphaps is a problem with the configuration, perhaps with fencing (i >>>> have not configured fencing at all) >>>> >>>> here is my cluster.conf. Any idea? >>>> >>>> Thanks in advace >>>> >>>> ESG >>>> >>>> >>>> >>>> >>>> >>> post_join_delay="3"/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> nofailback="0" ordere >>>> d="1" restricted="1"> >>>> * >>> priority="1"/> >>>> * * >>> priority="2"/> >>>> * >>>> >>>> >>>> >>>> >>>> >>> exclusive="0" name=" >>>> HTTP" recovery="relocate"> >>>> >>> name="http" server >>>> _root="/etc/httpd" shutdown_wait="0"/> >>>> >>>> >>>> >>> exclusive="0" name=" >>>> BBDD" recovery="relocate"> >>>> >>> listen_address="192.168 >>>> .1.183" name="mydb" shutdown_wait="0"/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> Hi ESG, >>> >>> Offcoures, as you have defined the priority of node1 as 1 and node2 as 2, >>> so node1 is having more priority, so whenever it will be up, it will try to >>> run the service on itself and so it will relocate the service from node2 to >>> node1. >>> >>> >>> Re, >>> Rajveer Singh >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From esggrupos at gmail.com Fri Feb 13 08:39:45 2009 From: esggrupos at gmail.com (ESGLinux) Date: Fri, 13 Feb 2009 09:39:45 +0100 Subject: [Linux-cluster] why all services stops when a node reboots? In-Reply-To: <3128ba140902130023k663372f6qa83e49fb3df98799@mail.gmail.com> References: <3128ba140902120017r3f9a8722t239c48e3e0a7bd9a@mail.gmail.com> <5f39cb8e0902121019u4b2d1307j5dd0f23fcf3f5506@mail.gmail.com> <3128ba140902121123r60ef584cxab5574f7abecb9e6@mail.gmail.com> <5f39cb8e0902121138x3c243fd8j25d3933c970df088@mail.gmail.com> <3128ba140902130023k663372f6qa83e49fb3df98799@mail.gmail.com> Message-ID: <3128ba140902130039x70f81d02nd0099a7f5550e7c4@mail.gmail.com> More clues, using system-config-cluster When I try to run a service in state failed I always get an error. I have tu disable the service, to get disabled state. With this state I can restart the services. I think I have a problem with the relocate because I cant do it nor with luci nor with system-config-cluster nor with clusvadm I always get error when i try this greetings ESG 2009/2/13 ESGLinux > Hello, > > The services run ok on node1. If I halt node2 and try to run the services > the run ok on node1. > If I run the services without cluster they also run ok. > > I have eliminated the HTTP services and I have left the service BBDD to > debug the problem. Here is the log when the service is running on node2 and > node1 comes up: > > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering GATHER state from 11. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Creating commit token because > I > am > the rep. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Saving state aru 1a high seq > receiv > ed 1a > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Storing new sequence id for > ring > 17 > f4 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering COMMIT state. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering RECOVERY state. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [0] member > 192.168.1.185: > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep > 192.168. > 1.185 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 1a high delivered 1a > received > f > lag 1 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [1] member > 192.168.1.188: > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep > 192.168. > 1.188 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 9 high delivered 9 > received > fla > g 1 > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Did not need to originate any > messa > ges in recovery. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Sending initial ORF token > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) > Feb 13 09:16:00 NODE2 openais[3326]: [SYNC ] This node is within the > primary component and will provide service. > Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering OPERATIONAL state. > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message > 192.168.1.185 > Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message > 192.168.1.188 > Feb 13 09:16:00 NODE2 openais[3326]: [CPG ] got joinlist message from node > 2 > Feb 13 09:16:03 NODE2 kernel: dlm: connecting to 1 > Feb 13 09:16:24 NODE2 clurgmgrd[4001]: Relocating service:BBDD to > better node node1 > Feb 13 09:16:24 NODE2 clurgmgrd[4001]: Stopping service > service:BBDD > Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: Stopping Service mysql:mydb > > Failed - Application Is Still Running > Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: Stopping Service mysql:mydb > > Failed > Feb 13 09:16:25 NODE2 clurgmgrd[4001]: stop on mysql "mydb" > returned 1 (generic error) > Feb 13 09:16:25 NODE2 avahi-daemon[3872]: Withdrawing address record for > 192.168.1.183 on eth0. > Feb 13 09:16:35 NODE2 clurgmgrd[4001]: #12: RG service:BBDD failed > to stop; intervention required > Feb 13 09:16:35 NODE2 clurgmgrd[4001]: Service service:BBDD is > failed > Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #70: Failed to relocate > service:BBDD; restarting locally > Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #43: Service service:BBDD has > failed; can not start. > Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #2: Service service:BBDD > returned failure code. Last Owner: node2 > Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #4: Administrator > intervention required. > > > As you can see in the message "Relocating service:BBDD to better node > node1" > > But it fails > > Another error that appears frecuently in my logs is the next: > > Checking Existence Of File /var/run/cluster/mysql/mysql:mydb.pid > [mysql:mydb] > Failed - File Doesn't Exist > > I dont know if this is important. but I think this makes the message err> > Stopping Service mysql:mydb > Failed - Application Is Still Running and this > makes the service fails (I?m just guessing...) > > Any idea? > > > ESG > > > 2009/2/12 rajveer singh > >> Hi, >> >> Ok, perhaps there is some problem with the services on node1 , so, are you >> able to run these services on node1 without cluster. You first stop the >> cluster, and try to run these services on node1. >> >> It should run. >> >> Re, >> Rajveer Singh >> >> 2009/2/13 ESGLinux >> >> Hello, >>> >>> Thats what I want, when node1 comes up I want to relocate to node1 but >>> what I get is all my services stoped and in failed state. >>> >>> With my configuration I expect to have the services running on node1. >>> >>> Any idea about this behaviour? >>> >>> Thanks >>> >>> ESG >>> >>> >>> 2009/2/12 rajveer singh >>> >>> >>>> >>>> 2009/2/12 ESGLinux >>>> >>>>> Hello all, >>>>> >>>>> I?m testing a cluster using luci as admin tool. I have configured 2 >>>>> nodes with 2 services http + mysql. This configuration works almost fine. I >>>>> have the services running on the node1 >>>>> and y reboot this node1. Then the services relocates to node2 and all >>>>> contnues working but, when the node1 goes up all the services stops. >>>>> >>>>> I think that the node1, when comes alive, tries to run the services and >>>>> that makes the services stops, can it be true? I think node1 should not >>>>> start anything because the services are running in node2. >>>>> >>>>> Perphaps is a problem with the configuration, perhaps with fencing (i >>>>> have not configured fencing at all) >>>>> >>>>> here is my cluster.conf. Any idea? >>>>> >>>>> Thanks in advace >>>>> >>>>> ESG >>>>> >>>>> >>>>> >>>>> >>>>> >>>> post_join_delay="3"/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> nofailback="0" ordere >>>>> d="1" restricted="1"> >>>>> * >>>> priority="1"/> >>>>> * * >>>> priority="2"/> >>>>> * >>>>> >>>>> >>>>> >>>>> >>>>> >>>> exclusive="0" name=" >>>>> HTTP" recovery="relocate"> >>>>> >>>> name="http" server >>>>> _root="/etc/httpd" shutdown_wait="0"/> >>>>> >>>>> >>>>> >>>> exclusive="0" name=" >>>>> BBDD" recovery="relocate"> >>>>> >>>> listen_address="192.168 >>>>> .1.183" name="mydb" shutdown_wait="0"/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>> >>>> Hi ESG, >>>> >>>> Offcoures, as you have defined the priority of node1 as 1 and node2 as >>>> 2, so node1 is having more priority, so whenever it will be up, it will try >>>> to run the service on itself and so it will relocate the service from node2 >>>> to node1. >>>> >>>> >>>> Re, >>>> Rajveer Singh >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From grimme at atix.de Fri Feb 13 08:56:35 2009 From: grimme at atix.de (Marc Grimme) Date: Fri, 13 Feb 2009 09:56:35 +0100 Subject: [Linux-cluster] qdisk on iscsi Message-ID: <200902130956.36116.grimme@atix.de> Hello together, has anybody ever tried to use the qdisk on iscsi? I have problems with a qdisk on iscsi. If I disable the qdisk everything works find. It looks like when the qdisk is started it cannot (somehow) successfully join the fencedomain as cman_tool services shows JOIN_POST_WAIT for the fencedomain (forever). And therefor the cluster works until a node has to be fenced ;-( . Has anybody seen such strange things with the qdisk? Are there some dependencies that are not yet supported for iscsi and qdisk? Thanks for your help. Marc. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ From esggrupos at gmail.com Fri Feb 13 10:44:35 2009 From: esggrupos at gmail.com (ESGLinux) Date: Fri, 13 Feb 2009 11:44:35 +0100 Subject: [Linux-cluster] why all services stops when a node reboots? In-Reply-To: <3128ba140902130039x70f81d02nd0099a7f5550e7c4@mail.gmail.com> References: <3128ba140902120017r3f9a8722t239c48e3e0a7bd9a@mail.gmail.com> <5f39cb8e0902121019u4b2d1307j5dd0f23fcf3f5506@mail.gmail.com> <3128ba140902121123r60ef584cxab5574f7abecb9e6@mail.gmail.com> <5f39cb8e0902121138x3c243fd8j25d3933c970df088@mail.gmail.com> <3128ba140902130023k663372f6qa83e49fb3df98799@mail.gmail.com> <3128ba140902130039x70f81d02nd0099a7f5550e7c4@mail.gmail.com> Message-ID: <3128ba140902130244l73d4ee74ga90a3407c3dc8e57@mail.gmail.com> hello all following with the problem, anyone can explain this: The commands are run all in aprox 1 minute: disable the service [root at NODE2 log]# clusvcadm -d BBDD Local machine disabling service:BBDD...Yes enable the service [root at NODE2 log]# clusvcadm -e BBDD Local machine trying to enable service:BBDD...Success service:BBDD is now running on node2 its ok, the service is running in node2, try to relocate to node1 root at NODE2 log]# clusvcadm -r BBDD -m node1 Trying to relocate service:BBDD to node1...Success it works!!! fine, try to relocate again to node2 service:BBDD is now running on node1 [root at NODE2 log]# clusvcadm -r BBDD -m node2 Trying to relocate service:BBDD to node2...Success it works again !!! I cant believe it. Try to relocate to node1 again service:BBDD is now running on node2 [root at NODE2 log]# clusvcadm -r BBDD -m node1 Trying to relocate service:BBDD to node1...Failure Opps!! it fails!!! Why? why 30 secs before it works and now it fails? In this situation all I can do is enable an disable the service again to get it works. It never gets up automatically... [root at NODE2 log]# clusvcadm -d BBDD Local machine disabling service:BBDD...Yes [root at NODE2 log]# clusvcadm -e BBDD Local machine trying to enable service:BBDD...Success service:BBDD is now running on node2 Any explanation for this behaviour??? I?m complety astonished :-( TIA ESG 2009/2/13 ESGLinux > More clues, > > using system-config-cluster > > When I try to run a service in state failed I always get an error. > I have tu disable the service, to get disabled state. With this state I can > restart the services. > > I think I have a problem with the relocate because I cant do it nor with > luci nor with system-config-cluster nor with clusvadm > > I always get error when i try this > > greetings > > ESG > > > 2009/2/13 ESGLinux > >> Hello, >> >> The services run ok on node1. If I halt node2 and try to run the services >> the run ok on node1. >> If I run the services without cluster they also run ok. >> >> I have eliminated the HTTP services and I have left the service BBDD to >> debug the problem. Here is the log when the service is running on node2 and >> node1 comes up: >> >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering GATHER state from >> 11. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Creating commit token because >> I >> am >> the rep. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Saving state aru 1a high seq >> receiv >> ed 1a >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Storing new sequence id for >> ring >> 17 >> f4 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering COMMIT state. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering RECOVERY state. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [0] member >> 192.168.1.185: >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep >> 192.168. >> 1.185 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 1a high delivered 1a >> received >> f >> lag 1 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [1] member >> 192.168.1.188: >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep >> 192.168. >> 1.188 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 9 high delivered 9 >> received >> fla >> g 1 >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Did not need to originate any >> messa >> ges in recovery. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Sending initial ORF token >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185) >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined: >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188) >> Feb 13 09:16:00 NODE2 openais[3326]: [SYNC ] This node is within the >> primary component and will provide service. >> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering OPERATIONAL state. >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message >> 192.168.1.185 >> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message >> 192.168.1.188 >> Feb 13 09:16:00 NODE2 openais[3326]: [CPG ] got joinlist message from >> node 2 >> Feb 13 09:16:03 NODE2 kernel: dlm: connecting to 1 >> Feb 13 09:16:24 NODE2 clurgmgrd[4001]: Relocating service:BBDD to >> better node node1 >> Feb 13 09:16:24 NODE2 clurgmgrd[4001]: Stopping service >> service:BBDD >> Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: Stopping Service mysql:mydb >> > Failed - Application Is Still Running >> Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: Stopping Service mysql:mydb >> > Failed >> Feb 13 09:16:25 NODE2 clurgmgrd[4001]: stop on mysql "mydb" >> returned 1 (generic error) >> Feb 13 09:16:25 NODE2 avahi-daemon[3872]: Withdrawing address record for >> 192.168.1.183 on eth0. >> Feb 13 09:16:35 NODE2 clurgmgrd[4001]: #12: RG service:BBDD failed >> to stop; intervention required >> Feb 13 09:16:35 NODE2 clurgmgrd[4001]: Service service:BBDD is >> failed >> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #70: Failed to relocate >> service:BBDD; restarting locally >> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #43: Service service:BBDD has >> failed; can not start. >> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #2: Service service:BBDD >> returned failure code. Last Owner: node2 >> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: #4: Administrator >> intervention required. >> >> >> As you can see in the message "Relocating service:BBDD to better node >> node1" >> >> But it fails >> >> Another error that appears frecuently in my logs is the next: >> >> Checking Existence Of File /var/run/cluster/mysql/mysql:mydb.pid >> [mysql:mydb] > Failed - File Doesn't Exist >> >> I dont know if this is important. but I think this makes the message err> >> Stopping Service mysql:mydb > Failed - Application Is Still Running and this >> makes the service fails (I?m just guessing...) >> >> Any idea? >> >> >> ESG >> >> >> 2009/2/12 rajveer singh >> >>> Hi, >>> >>> Ok, perhaps there is some problem with the services on node1 , so, are >>> you able to run these services on node1 without cluster. You first stop the >>> cluster, and try to run these services on node1. >>> >>> It should run. >>> >>> Re, >>> Rajveer Singh >>> >>> 2009/2/13 ESGLinux >>> >>> Hello, >>>> >>>> Thats what I want, when node1 comes up I want to relocate to node1 but >>>> what I get is all my services stoped and in failed state. >>>> >>>> With my configuration I expect to have the services running on node1. >>>> >>>> Any idea about this behaviour? >>>> >>>> Thanks >>>> >>>> ESG >>>> >>>> >>>> 2009/2/12 rajveer singh >>>> >>>> >>>>> >>>>> 2009/2/12 ESGLinux >>>>> >>>>>> Hello all, >>>>>> >>>>>> I?m testing a cluster using luci as admin tool. I have configured 2 >>>>>> nodes with 2 services http + mysql. This configuration works almost fine. I >>>>>> have the services running on the node1 >>>>>> and y reboot this node1. Then the services relocates to node2 and all >>>>>> contnues working but, when the node1 goes up all the services stops. >>>>>> >>>>>> I think that the node1, when comes alive, tries to run the services >>>>>> and that makes the services stops, can it be true? I think node1 should not >>>>>> start anything because the services are running in node2. >>>>>> >>>>>> Perphaps is a problem with the configuration, perhaps with fencing (i >>>>>> have not configured fencing at all) >>>>>> >>>>>> here is my cluster.conf. Any idea? >>>>>> >>>>>> Thanks in advace >>>>>> >>>>>> ESG >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> post_join_delay="3"/> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> nofailback="0" ordere >>>>>> d="1" restricted="1"> >>>>>> * >>>>> priority="1"/> >>>>>> * * >>>>> priority="2"/> >>>>>> * >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> exclusive="0" name=" >>>>>> HTTP" recovery="relocate"> >>>>>> >>>>> name="http" server >>>>>> _root="/etc/httpd" shutdown_wait="0"/> >>>>>> >>>>>> >>>>>> >>>>> exclusive="0" name=" >>>>>> BBDD" recovery="relocate"> >>>>>> >>>>> listen_address="192.168 >>>>>> .1.183" name="mydb" shutdown_wait="0"/> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Linux-cluster mailing list >>>>>> Linux-cluster at redhat.com >>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>>> >>>>> >>>>> Hi ESG, >>>>> >>>>> Offcoures, as you have defined the priority of node1 as 1 and node2 as >>>>> 2, so node1 is having more priority, so whenever it will be up, it will try >>>>> to run the service on itself and so it will relocate the service from node2 >>>>> to node1. >>>>> >>>>> >>>>> Re, >>>>> Rajveer Singh >>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nick at javacat.f2s.com Fri Feb 13 15:01:57 2009 From: nick at javacat.f2s.com (nick at javacat.f2s.com) Date: Fri, 13 Feb 2009 15:01:57 +0000 Subject: [Linux-cluster] gfs_controld plock errs In-Reply-To: <20090212155026.GA15337@redhat.com> References: <1234448045.49942ead3aeeb@webmail.freedom2surf.net> <20090212155026.GA15337@redhat.com> Message-ID: <1234537317.49958b6526a1a@webmail.freedom2surf.net> Quoting David Teigland : > On Thu, Feb 12, 2009 at 02:14:05PM +0000, nick at javacat.f2s.com wrote: > > Hi Folks > > > > > > RHEL 5.2 > > > > gfs2-utils-0.1.44-1.el5_2.1 > > kmod-gfs-PAE-0.1.23-5.el5 > > gfs-utils-0.1.17-1.el5 > > kmod-gfs2-PAE-1.92-1.1.el5 > > > > 5 nodes in a cluster just using GFS over iSCSI. > > > > Were getting the following error in /var/log/messages - > > > > Feb 11 09:17:00 finapp1 gfs_controld[7084]: plock result write err 0 errno 9 > > Feb 11 09:17:08 finapp1 gfs_controld[7084]: plock result write err 0 errno 9 > > Feb 12 13:02:25 finapp1 gfs_controld[7084]: plock result write err 0 errno 9 > > Feb 12 13:02:32 finapp1 gfs_controld[7084]: plock result write err 0 errno 9 > > > > And the users are complaining that Oracle occasionally runs slow. > > > > Could anyone help to explain what these errors mean please ? > > Harmless messages that have been removed, see here: > > http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=685498d154acfff23e4af7bfe874a7b0ed2eb9c5 > > Dave > Aha thanks for the info Dave :) From breaktime123 at yahoo.com Fri Feb 13 16:04:13 2009 From: breaktime123 at yahoo.com (break time) Date: Fri, 13 Feb 2009 08:04:13 -0800 (PST) Subject: [Linux-cluster] Virtual IP help Message-ID: <561530.27357.qm@web111107.mail.gq1.yahoo.com> Hi All, ? I am new to redhat cluster and RHEL 5.3. I could not find much info howto setup the virtual IP, so if the main server fails, this virtual IP (public) will map to failover host's IP (private).? I understand this concept from Veritas cluster, but not sure apply to redhat cluster. ? Please let me know howto setup the network for failover capabilities ? My test setup: Two servers with 2 NIC cards, one public and private IP Share storage is NFS Failover application is Apache ? Thanks bt ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Fri Feb 13 16:44:20 2009 From: rpeterso at redhat.com (Bob Peterson) Date: Fri, 13 Feb 2009 11:44:20 -0500 (EST) Subject: [Linux-cluster] Virtual IP help In-Reply-To: <561530.27357.qm@web111107.mail.gq1.yahoo.com> Message-ID: <1975540531.4903511234543460628.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> ----- "break time" wrote: | Hi All, | | I am new to redhat cluster and RHEL 5.3. | I could not find much info howto setup the virtual IP, so if the main | server fails, this virtual IP (public) will map to failover host's IP | (private). I understand this concept from Veritas cluster, but not | sure apply to redhat cluster. | | Please let me know howto setup the network for failover capabilities | | My test setup: | Two servers with 2 NIC cards, one public and private IP | Share storage is NFS | Failover application is Apache | | Thanks | bt Hi BT, Here's one place to start: http://sources.redhat.com/cluster/doc/nfscookbook.pdf "Bob Peterson's Unofficial NFS/GFS Cookbook" Regards, Bob Peterson Red Hat GFS From rmccabe at redhat.com Fri Feb 13 16:49:29 2009 From: rmccabe at redhat.com (Ryan McCabe) Date: Fri, 13 Feb 2009 11:49:29 -0500 Subject: [Linux-cluster] How to clean up conga database ? In-Reply-To: <49950342.60606@gmail.com> References: <49950342.60606@gmail.com> Message-ID: <20090213164929.GA2195@redhat.com> On Fri, Feb 13, 2009 at 01:21:06PM +0800, Kein He wrote: > Hi guys, > > > Is there anyway to clean up conga database ( back to state that before > "luci_admin init" )? I searched the official site, but i didn't find > anything useful. There's no way to restore it to that state other than removing the package (and the files left behind in /var/lib/luci and /usr/lib{,64}/luci) and reinstalling. If you want to remove all systems and clusters from the management interface, you can do that via the "Manage Systems" link in the "homebase" tab. Is there something specific you want to get rid of other than managed systems and clusters? Ryan From breaktime123 at yahoo.com Fri Feb 13 17:09:49 2009 From: breaktime123 at yahoo.com (break time) Date: Fri, 13 Feb 2009 09:09:49 -0800 (PST) Subject: [Linux-cluster] Virtual IP help In-Reply-To: <1975540531.4903511234543460628.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Message-ID: <774347.11025.qm@web111114.mail.gq1.yahoo.com> ? Thanks Bob, ? This cookbook using LVS server for load balancing. Can I config failover cluster w/o LVS ? ? Minh ? --- On Fri, 2/13/09, Bob Peterson wrote: From: Bob Peterson Subject: Re: [Linux-cluster] Virtual IP help To: breaktime123 at yahoo.com, "linux clustering" Date: Friday, February 13, 2009, 8:44 AM ----- "break time" wrote: | Hi All, | | I am new to redhat cluster and RHEL 5.3. | I could not find much info howto setup the virtual IP, so if the main | server fails, this virtual IP (public) will map to failover host's IP | (private). I understand this concept from Veritas cluster, but not | sure apply to redhat cluster. | | Please let me know howto setup the network for failover capabilities | | My test setup: | Two servers with 2 NIC cards, one public and private IP | Share storage is NFS | Failover application is Apache | | Thanks | bt Hi BT, Here's one place to start: http://sources.redhat.com/cluster/doc/nfscookbook.pdf "Bob Peterson's Unofficial NFS/GFS Cookbook" Regards, Bob Peterson Red Hat GFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Fri Feb 13 17:44:03 2009 From: rpeterso at redhat.com (Bob Peterson) Date: Fri, 13 Feb 2009 12:44:03 -0500 (EST) Subject: [Linux-cluster] Virtual IP help In-Reply-To: <774347.11025.qm@web111114.mail.gq1.yahoo.com> Message-ID: <846676631.4913801234547043564.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> ----- "break time" wrote: | Thanks Bob, | | This cookbook using LVS server for load balancing. Can I config | failover cluster w/o LVS ? | | Minh Hi Minh, You don't need LVS at all for clustering. It's just an option. In fact, the configuration in the cookbook does not use LVS at all; it just presents it as an option. Many people only configure failover (active-passive) for their clusters. It all depends on what you want and need. Regards, Bob Peterson Red Hat GFS From lupok at saline.k12.mi.us Sat Feb 14 03:46:08 2009 From: lupok at saline.k12.mi.us (Ken Lupo) Date: Fri, 13 Feb 2009 22:46:08 -0500 Subject: [Linux-cluster] gfs is for a different cluster Message-ID: <4d87a25f0902131946y2744acf4uba265b88d0f707eb@mail.gmail.com> Hello, I am trying to remount a gfs2 partition that I created in a different cluster and I am getting: /sbin/mount.gfs2: fs is for a different cluster /sbin/mount.gfs2: error mounting lockproto lock_dlm How do I change the gfs cluster association? Thank you, Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at serioustechnology.com Sat Feb 14 14:37:29 2009 From: lists at serioustechnology.com (Geoffrey) Date: Sat, 14 Feb 2009 09:37:29 -0500 Subject: [Linux-cluster] hardware suggestions for a simple cluster Message-ID: <4996D729.5010408@serioustechnology.com> I'm hoping to put together a simple two machine cluster with some kind of shared disk. This is simply so that I can get some hands on experience with a cluster. The plan is to cluster the two machines, then run xen virtual machines on top of that. As I'm self employed, the key here is the best price. It doesn't have to be the best solution, but a workable one that will give me practical experience. Suggestions would be greatly appreciated. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin From jakub.suchy at enlogit.cz Sat Feb 14 14:44:10 2009 From: jakub.suchy at enlogit.cz (Jakub Suchy) Date: Sat, 14 Feb 2009 15:44:10 +0100 Subject: [Linux-cluster] hardware suggestions for a simple cluster In-Reply-To: <4996D729.5010408@serioustechnology.com> References: <4996D729.5010408@serioustechnology.com> Message-ID: <20090214144410.GB10479@galatea> Two simple virtual machines will do the job! I even tried running Vmware machines as a cluster and XEN inside (=your setup), it works too but it's kind of slow. Cheers, Jakub Geoffrey wrote: > I'm hoping to put together a simple two machine cluster with some kind > of shared disk. This is simply so that I can get some hands on > experience with a cluster. The plan is to cluster the two machines, > then run xen virtual machines on top of that. > > As I'm self employed, the key here is the best price. It doesn't have > to be the best solution, but a workable one that will give me practical > experience. > > Suggestions would be greatly appreciated. > > -- > Until later, Geoffrey > > Those who would give up essential Liberty, to purchase a little > temporary Safety, deserve neither Liberty nor Safety. > - Benjamin Franklin > > -- > Linux-cluster mailing list From linuxelf at gmail.com Sat Feb 14 14:48:21 2009 From: linuxelf at gmail.com (Stephen Gilbert) Date: Sat, 14 Feb 2009 09:48:21 -0500 Subject: [Linux-cluster] hardware suggestions for a simple cluster In-Reply-To: <4996D729.5010408@serioustechnology.com> References: <4996D729.5010408@serioustechnology.com> Message-ID: <4996D9B5.4010500@gmail.com> We use Linux-ha (http://linux-ha.org) and drbd instead of a shared disk. It plays nicely with Xen. In one instance, we've got two Xen servers, each housing 4 virtual servers in separate clusters, all sharing data with drbd. It works very well. We're also using CentOS, so none of the software used costs anything. Geoffrey wrote: > I'm hoping to put together a simple two machine cluster with some kind > of shared disk. This is simply so that I can get some hands on > experience with a cluster. The plan is to cluster the two machines, > then run xen virtual machines on top of that. > > As I'm self employed, the key here is the best price. It doesn't have > to be the best solution, but a workable one that will give me > practical experience. > > Suggestions would be greatly appreciated. > From jeff.sturm at eprize.com Sat Feb 14 17:40:37 2009 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Sat, 14 Feb 2009 12:40:37 -0500 Subject: [Linux-cluster] gfs is for a different cluster In-Reply-To: <4d87a25f0902131946y2744acf4uba265b88d0f707eb@mail.gmail.com> References: <4d87a25f0902131946y2744acf4uba265b88d0f707eb@mail.gmail.com> Message-ID: <64D0546C5EBBD147B75DE133D798665F021BA030@hugo.eprize.local> Have not tried this, but: $ man gfs2_tool ... sb device table [newvalue] View (and possibly replace) the name of the locking table in the file system superblock. The file system shouldn't be mounted by any client when you do this. Looks like this is what you need. The name of the locking table is typically :. ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Ken Lupo Sent: Friday, February 13, 2009 10:46 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] gfs is for a different cluster Hello, I am trying to remount a gfs2 partition that I created in a different cluster and I am getting: /sbin/mount.gfs2: fs is for a different cluster /sbin/mount.gfs2: error mounting lockproto lock_dlm How do I change the gfs cluster association? Thank you, Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From esggrupos at gmail.com Mon Feb 16 08:29:48 2009 From: esggrupos at gmail.com (ESGLinux) Date: Mon, 16 Feb 2009 09:29:48 +0100 Subject: [Linux-cluster] fence " node1" failed if etho down Message-ID: <3128ba140902160029v5bae32bfx10f64bec6a4fbd40@mail.gmail.com> Hello All, I have a cluster with two nodes running one service (mysql). The two nodes uses a ISCSI disk with gfs on it. I haven?t configured fencing at all. I have tested diferent situtations of fail and these are my results: If I halt node1 the service relocates to node2 - OK if I kill the process in node1 the services relocate to node2 - OK but if I unplug the wire of the ether device or make ifdown eth0 on node1 all the cluster fails. The service doesn?t relocate. In node2 I get the messages: Feb 15 13:29:34 localhost fenced[3405]: fencing node "192.168.1.188" Feb 15 13:29:34 localhost fenced[3405]: fence "192.168.1.188" failed Feb 15 13:29:39 localhost fenced[3405]: fencing node "192.168.1.188" Feb 15 13:29:39 localhost fenced[3405]: fence "192.168.1.188" failed again and again. The node2 never runs the service and I try to reboot the node1 the computer hangs waiting for stopping the services. In this situation all I can do is to switch off the power of node1 and reboot the node2. This situation is not acceptable at all. I think the problem is just with fencing but I dont know how to apply to this situation ( I have RTFM from redhat site but I have seen how to apply it. :-( ) this is my cluster.conf file Any idea? references? Thanks in advance Greetings ESG -------------- next part -------------- An HTML attachment was scrubbed... URL: From brettcave at gmail.com Mon Feb 16 11:56:11 2009 From: brettcave at gmail.com (Brett Cave) Date: Mon, 16 Feb 2009 13:56:11 +0200 Subject: [Linux-cluster] qdisk on iscsi In-Reply-To: <200902130956.36116.grimme@atix.de> References: <200902130956.36116.grimme@atix.de> Message-ID: On Fri, Feb 13, 2009 at 10:56 AM, Marc Grimme wrote: > Hello together, > has anybody ever tried to use the qdisk on iscsi? > > I have problems with a qdisk on iscsi. If I disable the qdisk everything works > find. > > It looks like when the qdisk is started it cannot (somehow) successfully join > the fencedomain as cman_tool services shows JOIN_POST_WAIT for the > fencedomain (forever). And therefor the cluster works until a node has to be > fenced ;-( . > > Has anybody seen such strange things with the qdisk? Are there some > dependencies that are not yet supported for iscsi and qdisk? What errors do you get in your logs? Im running qdisk on a shared FC resource, but did initially have errors, if you could send through logs and cluster.conf > > Thanks for your help. > > Marc. > -- > Gruss / Regards, > > Marc Grimme > http://www.atix.de/ http://www.open-sharedroot.org/ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From grimme at atix.de Mon Feb 16 14:54:25 2009 From: grimme at atix.de (Marc Grimme) Date: Mon, 16 Feb 2009 15:54:25 +0100 Subject: [Linux-cluster] qdisk on iscsi In-Reply-To: References: <200902130956.36116.grimme@atix.de> Message-ID: <200902161554.25418.grimme@atix.de> On Monday 16 February 2009 12:56:11 Brett Cave wrote: > What errors do you get in your logs? Im running qdisk on a shared FC > resource, but did initially have errors, if you could send through > logs and cluster.conf On a shared FC resource it works just fine. I just figured out that it runs when I'm using iscsi-target http://sourceforge.net/projects/iscsitarget/ whereas with scsi-tgt (provided by Redhat) or with NetApp ISCSI Implementation it does not (after the first node joins the fence_domain it stucks in JOIN_START_WAIT). I don't think that the cluster.conf helps very much because it works when I change the iscsi implementation everything works just fine. The relevant part is as follows: ... ... I was wandering if the qdisk requires special scsi functionality which I doubt but wanted to be sure. Or anybody else experianced a strange behaviour when using qdisk on iSCSI. Lon, perhaps you could just comment on this shortly. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ From corey.kovacs at gmail.com Mon Feb 16 18:55:00 2009 From: corey.kovacs at gmail.com (Corey Kovacs) Date: Mon, 16 Feb 2009 18:55:00 +0000 Subject: [Linux-cluster] More GFS2 tuning... Message-ID: <7d6e8da40902161055h1d0d8429mfbfea11078c437e9@mail.gmail.com> Fellow cluster folks, I am currently trying to get as much throughput as I can for a NFS cluster I am about to put into production but the numbers I am getting for throughput, like others have said, are dismal. My setup consists of 5 DL360-G5's w/8GB ram running RHEL5.3 x86_64 with dual 4G FC Qlogic cards, connected via 4G FC switches to an EVA8100, with 48 spindles in the diskgroup. The luns are between 500 and 750GB and I am using device-mapper multipath in round-robin with a rr_min_io of 250 and multibus. I've even adjusted the qlogic drivers to have a q depth of 64. By my reckoning, I should be able to see 400MB or more sustained throughput using this setup. If this is a pipe dream, someone let me know quick before I go nutz. When a large multi user, multi file, multi thread simulation of a total file output of 18GB is run, I plot the output of vmstat 1 and see a definite pattern with is very periodic. The bo values start at around 200MB, then drop down to 0 in most cases for a few seconds, then spike to ~700MB/s then eases back down to 200, 150 and back down to 0. It looks very much like a cacheing issue to me. These numbers are almost identical on the FC switches. I'd like to level it out a bit so that the average climbs up for a best general usage profile. This is going to be as mentioned above a NFS server exporting 1 export per node serving roughly 250 machines. I've read that GFS2 is supposed to be "self tuning" but I don't think these are necessarily GFS2 issues. I was even told by a redhat engineer at last years summit that I could expect to see up to 600-900 MB/s. Not sure I believe that one, but 400 seems doable. Anyone have something similar? What I/O rates are people getting? Might be useful to have use cases and configs on a wiki somewhere to let people compare results etc. Anyway, all help is welcome and I am willing to test near anything as long as I won't get arrested for it. Corey From gordan at bobich.net Mon Feb 16 19:09:09 2009 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 16 Feb 2009 19:09:09 +0000 Subject: [Linux-cluster] More GFS2 tuning... In-Reply-To: <7d6e8da40902161055h1d0d8429mfbfea11078c437e9@mail.gmail.com> References: <7d6e8da40902161055h1d0d8429mfbfea11078c437e9@mail.gmail.com> Message-ID: <4999B9D5.2060406@bobich.net> > When a large multi user, multi file, multi thread simulation of a > total file output of 18GB is run, I plot the output of vmstat 1 and > see a definite pattern with is very periodic. The bo values start at > around 200MB, then drop down to 0 in most cases for a few seconds, > then spike to ~700MB/s then eases back down to 200, 150 and back down > to 0. It looks very much like a cacheing issue to me. These numbers > are almost identical on the FC switches. How are your test files distributed across directories, and what is your ratio of reads to writes? Are you mounting with noatime,nodiratime,noquota? What is your clustering network connection? If all your files are in the same directory (or a small number of subdirectories) and the access is distributed across all the nodes, then I have to say that you may well be out of luck and what you are seeing is normal. Bouncing directory locks between the nodes on each access will introduce enough latency to kill the performance. Also remember that no two nodes can have a lock on the same file at the same time, and for file creation/deletion, that means a directory lock, which in turn means only one file creation/deletion per directory at any one time. I can well believe the 900MB/s figure if you are just reading back one big file from multiple nodes. But the performance will fall off a cliff on random I/O involving writes. Gordan From jeff.sturm at eprize.com Mon Feb 16 19:26:32 2009 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Mon, 16 Feb 2009 14:26:32 -0500 Subject: [Linux-cluster] More GFS2 tuning... In-Reply-To: <7d6e8da40902161055h1d0d8429mfbfea11078c437e9@mail.gmail.com> References: <7d6e8da40902161055h1d0d8429mfbfea11078c437e9@mail.gmail.com> Message-ID: <64D0546C5EBBD147B75DE133D798665F021BA068@hugo.eprize.local> > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Corey Kovacs > Sent: Monday, February 16, 2009 1:55 PM > To: linux-cluster at redhat.com > Subject: [Linux-cluster] More GFS2 tuning... > > By my reckoning, I should be able to see 400MB or more > sustained throughput using this setup. If this is a pipe > dream, someone let me know quick before I go nutz. What do you get from the raw device? (I.e. if you remove GFS/NFS from the picture.) > The bo values start at around 200MB, then drop down to 0 in > most cases for a few seconds, then spike to ~700MB/s then > eases back down to 200, 150 and back down to 0. It looks very > much like a cacheing issue to me. Linux virtual memory does some funny things with fs caching. Try some tests with O_DIRECT to bypass the buffer cache. On RHEL 5 systems, you can achieve that with "dd ... oflag=direct" and varying block sizes. > I've read that GFS2 is supposed to be "self tuning" but I > don't think these are necessarily GFS2 issues. Agreed. If you can experiment with the hardware, what do you get from other fs types? (such as ext3) > Anyone have something similar? What I/O rates are people getting? I don't have any FC hardware quite as nice as yours, but multipathing AoE over a pair of GigE connections we can get 200MB/s raw, sequential throughput. (I.e. about the limits of the interconnects.) My GFS filesystems are mostly a collection of very small (~1MB or less) files, so, it's hard to say how they are performing. I'm much more concerned about the rate of file creates over GFS than raw throughput right now... Jeff From matt at bravenet.com Mon Feb 16 19:53:06 2009 From: matt at bravenet.com (Matthew Kent) Date: Mon, 16 Feb 2009 11:53:06 -0800 Subject: [Linux-cluster] qdisk on iscsi In-Reply-To: <200902161554.25418.grimme@atix.de> References: <200902130956.36116.grimme@atix.de> <200902161554.25418.grimme@atix.de> Message-ID: <1234813986.16621.30.camel@fuego> On Mon, 2009-02-16 at 15:54 +0100, Marc Grimme wrote: > On a shared FC resource it works just fine. I just figured out that it runs > when I'm using iscsi-target http://sourceforge.net/projects/iscsitarget/ > whereas with scsi-tgt (provided by Redhat) or with NetApp ISCSI > Implementation it does not (after the first node joins the fence_domain it > stucks in JOIN_START_WAIT). Certainly odd that one implementation would work and another wouldn't. > I was wandering if the qdisk requires special scsi functionality which I doubt > but wanted to be sure. Or anybody else experianced a strange behaviour when > using qdisk on iSCSI. Didn't encounter any issues in setting up RHEL 5.3's qdisk against iscsitarget and equallogic targets. Are you able to use a regular target off the NetApp on the same server outside of qdisk? Have it available and mounted on boot etc. -- Matthew Kent \ SA \ bravenet.com From grimme at atix.de Mon Feb 16 20:15:36 2009 From: grimme at atix.de (Marc Grimme) Date: Mon, 16 Feb 2009 21:15:36 +0100 Subject: [Linux-cluster] qdisk on iscsi In-Reply-To: <1234813986.16621.30.camel@fuego> References: <200902130956.36116.grimme@atix.de> <200902161554.25418.grimme@atix.de> <1234813986.16621.30.camel@fuego> Message-ID: <200902162115.36998.grimme@atix.de> On Monday 16 February 2009 20:53:06 Matthew Kent wrote: > On Mon, 2009-02-16 at 15:54 +0100, Marc Grimme wrote: > > On a shared FC resource it works just fine. I just figured out that it > > runs when I'm using iscsi-target > > http://sourceforge.net/projects/iscsitarget/ whereas with scsi-tgt > > (provided by Redhat) or with NetApp ISCSI > > Implementation it does not (after the first node joins the fence_domain > > it stucks in JOIN_START_WAIT). > > Certainly odd that one implementation would work and another wouldn't. Yes. I think the same. Strange enough. > > > I was wandering if the qdisk requires special scsi functionality which I > > doubt but wanted to be sure. Or anybody else experianced a strange > > behaviour when using qdisk on iSCSI. > > Didn't encounter any issues in setting up RHEL 5.3's qdisk against > iscsitarget and equallogic targets. Ok. iscsitarget works for me as well. > > Are you able to use a regular target off the NetApp on the same server > outside of qdisk? Have it available and mounted on boot etc. Yes we are booting from it. /boot is iSCSI and I think a bunch of other disks used by all guests (Oracle, SAP, rootfs, boot etc. All seem to work.). -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 | 85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) | Vorsitzender des Aufsichtsrats: Dr. Martin Buss From corey.kovacs at gmail.com Mon Feb 16 21:40:48 2009 From: corey.kovacs at gmail.com (Corey Kovacs) Date: Mon, 16 Feb 2009 21:40:48 +0000 Subject: [Linux-cluster] More GFS2 tuning... In-Reply-To: <4999B9D5.2060406@bobich.net> References: <7d6e8da40902161055h1d0d8429mfbfea11078c437e9@mail.gmail.com> <4999B9D5.2060406@bobich.net> Message-ID: <7d6e8da40902161340u35feefbaj44bf6fbca3f45920@mail.gmail.com> On Mon, Feb 16, 2009 at 7:09 PM, Gordan Bobic wrote: > >> When a large multi user, multi file, multi thread simulation of a >> total file output of 18GB is run, I plot the output of vmstat 1 and >> see a definite pattern with is very periodic. The bo values start at >> around 200MB, then drop down to 0 in most cases for a few seconds, >> then spike to ~700MB/s then eases back down to 200, 150 and back down >> to 0. It looks very much like a cacheing issue to me. These numbers >> are almost identical on the FC switches. > > How are your test files distributed across directories, and what is your > ratio of reads to writes? Are you mounting with noatime,nodiratime,noquota? > What is your clustering network connection? The tests are from a single node in the cluster, using a single filesystem/mount point. I am doing writes only. I mount using noatime,nodiratime. When I tried noquota, I got an error for invalid option. > > If all your files are in the same directory (or a small number of > subdirectories) and the access is distributed across all the nodes, then I > have to say that you may well be out of luck and what you are seeing is > normal. Bouncing directory locks between the nodes on each access will > introduce enough latency to kill the performance. Also remember that no two > nodes can have a lock on the same file at the same time, and for file > creation/deletion, that means a directory lock, which in turn means only one > file creation/deletion per directory at any one time. I'll try it with a single file and check the difference. Thanks. -Corey > > I can well believe the 900MB/s figure if you are just reading back one big > file from multiple nodes. But the performance will fall off a cliff on > random I/O involving writes. > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From corey.kovacs at gmail.com Mon Feb 16 21:43:40 2009 From: corey.kovacs at gmail.com (Corey Kovacs) Date: Mon, 16 Feb 2009 21:43:40 +0000 Subject: [Linux-cluster] More GFS2 tuning... In-Reply-To: <64D0546C5EBBD147B75DE133D798665F021BA068@hugo.eprize.local> References: <7d6e8da40902161055h1d0d8429mfbfea11078c437e9@mail.gmail.com> <64D0546C5EBBD147B75DE133D798665F021BA068@hugo.eprize.local> Message-ID: <7d6e8da40902161343t4ae47d58u7a4381d0ee2033a7@mail.gmail.com> On Mon, Feb 16, 2009 at 7:26 PM, Jeff Sturm wrote: >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Corey Kovacs >> Sent: Monday, February 16, 2009 1:55 PM >> To: linux-cluster at redhat.com >> Subject: [Linux-cluster] More GFS2 tuning... >> >> By my reckoning, I should be able to see 400MB or more >> sustained throughput using this setup. If this is a pipe >> dream, someone let me know quick before I go nutz. > > What do you get from the raw device? (I.e. if you remove GFS/NFS from > the picture.) Haven't tried this yet, will do it in the morning. > >> The bo values start at around 200MB, then drop down to 0 in >> most cases for a few seconds, then spike to ~700MB/s then >> eases back down to 200, 150 and back down to 0. It looks very >> much like a cacheing issue to me. > > Linux virtual memory does some funny things with fs caching. Try some > tests with O_DIRECT to bypass the buffer cache. On RHEL 5 systems, you > can achieve that with "dd ... oflag=direct" and varying block sizes. > >> I've read that GFS2 is supposed to be "self tuning" but I >> don't think these are necessarily GFS2 issues. This is a new one for me, I'll try this in the morning. > Agreed. If you can experiment with the hardware, what do you get from > other fs types? (such as ext3) > These tests are on the roadmap as well. >> Anyone have something similar? What I/O rates are people getting? > > I don't have any FC hardware quite as nice as yours, but multipathing > AoE over a pair of GigE connections we can get 200MB/s raw, sequential > throughput. (I.e. about the limits of the interconnects.) > > My GFS filesystems are mostly a collection of very small (~1MB or less) > files, so, it's hard to say how they are performing. I'm much more > concerned about the rate of file creates over GFS than raw throughput > right now... > > Jeff > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Thanks -Corey From lupok at saline.k12.mi.us Mon Feb 16 22:36:10 2009 From: lupok at saline.k12.mi.us (Ken Lupo) Date: Mon, 16 Feb 2009 17:36:10 -0500 Subject: [Linux-cluster] Clustering File Services Message-ID: <4d87a25f0902161436i49184150g84ba46bbfe31a1f7@mail.gmail.com> For those of you using Clustering and File Services, specifically Samba how you're doing it? Are you using CTDB or some other variation or Samba. I've been trying to setup CTDB on a clustered gfs2 file system and have been running into issues with the exact procedure for getting it running. Any help would be greatly appreciated. Thank you, Ken -- Ken Lupo, Saline Area Schools Office: 734.429.8014 Mobile: 248.881.5681 http://www.salineschools.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From spods at iinet.net.au Tue Feb 17 03:56:56 2009 From: spods at iinet.net.au (spods at iinet.net.au) Date: Tue, 17 Feb 2009 12:56:56 +0900 Subject: [Linux-cluster] fence ' node1' failed if etho down Message-ID: <60761.1234843016@iinet.net.au> A couple of things. You don't have any fencing devices defined in cluster.conf at all. No power fencing, no I/O fencing, not even manual fencing. You need to define how each node of the cluster is to be fenced (forcibly removed from the cluster) for proper failover operations to occur. Secondly, if the only connection shared between the two nodes is the network cord you just disconnected, then of course nothing will happen - each node has just lost the only common connection between each other to control the faulty node (i.e. through fencing). There need's to be more connections in between the nodes of a cluster than just a network card. This can be achieved with a second NIC, I/O fencing, centralised or individual power controls (I/O switches or IPMI). That way in the event that the network connection is the single point of failure between the two nodes, at least a node can be fenced if it's behaving improperly. Once the faulty node is fenced, the remaining nodes should at that point continue providing cluster services. Regards, Stewart On Mon Feb 16 16:29 , ESGLinux sent: >Hello All, > >I have a cluster with two nodes running one service (mysql). The two nodes uses a ISCSI disk with gfs on it. >I haven??t configured fencing at all. > >I have tested diferent situtations of fail and these are my results: > > >If I halt node1 the service relocates to node2 - OK >if I kill the process in node1 the services relocate to node2 - OK > >but > >if I unplug the wire of the ether device or make ifdown eth0 on node1 all the cluster fails. The service doesn??t relocate. > >In node2 I get the messages: > >Feb 15 13:29:34 localhost fenced[3405]: fencing node "192.168.1.188" >Feb 15 13:29:34 localhost fenced[3405]: fence "192.168.1.188" failed >Feb 15 13:29:39 localhost fenced[3405]: fencing node "192.168.1.188" > >Feb 15 13:29:39 localhost fenced[3405]: fence "192.168.1.188" failed > >again and again. The node2 never runs the service and I try to reboot the node1 the computer hangs waiting for stopping the services. > > >In this situation all I can do is to switch off the power of node1 and reboot the node2. This situation is not acceptable at all. > >I think the problem is just with fencing but I dont know how to apply to this situation ( I have RTFM from redhat site?? but I have seen how to apply it. :-( ) > > >this is my cluster.conf file > > >?????????????? > >?????????????? >?????????????????????????????? >?????????????????????????????????????????????? >?????????????????????????????? >?????????????????????????????? > >?????????????????????????????????????????????? >?????????????????????????????? >?????????????? >?????????????? >?????????????? > >?????????????? >?????????????????????????????? >?????????????????????????????????????????????? >?????????????????????????????????????????????????????????????? > >?????????????????????????????????????????????????????????????? >?????????????????????????????????????????????? >?????????????????????????????? >?????????????????????????????? > >?????????????????????????????? >?????????????????????????????????????????????? > >?????????????????????????????????????????????? >?????????????????????????????? >?????????????? > > >Any idea? references? > >Thanks in advance > > >Greetings > >ESG > > > > From esggrupos at gmail.com Tue Feb 17 08:48:39 2009 From: esggrupos at gmail.com (ESGLinux) Date: Tue, 17 Feb 2009 09:48:39 +0100 Subject: [Linux-cluster] fence ' node1' failed if etho down In-Reply-To: <60761.1234843016@iinet.net.au> References: <60761.1234843016@iinet.net.au> Message-ID: <3128ba140902170048s2e5c0897sde053ec81f64ccfa@mail.gmail.com> Hi, first, thank you very much for your answer, You are right, I have not fencing devices at all, but for one reason: I havent!!! I?m just testing with 2 xen virtual machines running on the same host and mounting an iscsi disk on other host to simulate shared storage. on the other hand, I think I don?t understand the concept of fencing, I try to configure fencing devices with luci, but when I try I don?t know what to select from the combo of fencing devices. (perphaps manual fencing, althoug its not recommended for production) so, as I think this is a newbie and perhaps a silly question, Can you give any good reference about fencing to learn about it or an example configuation with fence devices to see how it must be done thanks again, ESG 2009/2/17 spods at iinet.net.au > A couple of things. > > You don't have any fencing devices defined in cluster.conf at all. No > power > fencing, no I/O fencing, not even manual fencing. > > You need to define how each node of the cluster is to be fenced (forcibly > removed > from the cluster) for proper failover operations to occur. > > Secondly, if the only connection shared between the two nodes is the > network cord > you just disconnected, then of course nothing will happen - each node has > just > lost the only common connection between each other to control the faulty > node > (i.e. through fencing). > > There need's to be more connections in between the nodes of a cluster than > just a > network card. This can be achieved with a second NIC, I/O fencing, > centralised > or individual power controls (I/O switches or IPMI). > > That way in the event that the network connection is the single point of > failure > between the two nodes, at least a node can be fenced if it's behaving > improperly. > > Once the faulty node is fenced, the remaining nodes should at that point > continue > providing cluster services. > > Regards, > > Stewart > > > > > On Mon Feb 16 16:29 , ESGLinux sent: > > >Hello All, > > > >I have a cluster with two nodes running one service (mysql). The two nodes > uses > a ISCSI disk with gfs on it. > >I haven??t configured fencing at all. > > > >I have tested diferent situtations of fail and these are my results: > > > > > >If I halt node1 the service relocates to node2 - OK > >if I kill the process in node1 the services relocate to node2 - OK > > > >but > > > >if I unplug the wire of the ether device or make ifdown eth0 on node1 all > the > cluster fails. The service doesn??t relocate. > > > >In node2 I get the messages: > > > >Feb 15 13:29:34 localhost fenced[3405]: fencing node "192.168.1.188" > >Feb 15 13:29:34 localhost fenced[3405]: fence "192.168.1.188" failed > >Feb 15 13:29:39 localhost fenced[3405]: fencing node "192.168.1.188" > > > >Feb 15 13:29:39 localhost fenced[3405]: fence "192.168.1.188" failed > > > >again and again. The node2 never runs the service and I try to reboot the > node1 > the computer hangs waiting for stopping the services. > > > > > >In this situation all I can do is to switch off the power of node1 and > reboot > the node2. This situation is not acceptable at all. > > > >I think the problem is just with fencing but I dont know how to apply to > this > situation ( I have RTFM from redhat site? but I have seen how to apply it. > :-( ) > > > > > >this is my cluster.conf file > > > > > >? ? ? ? ? ? ? post_join_delay="3"/> > > > >? ? ? ? ? ? ? > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? votes="1"> > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? votes="1"> > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >? ? ? ? ? ? ? > >? ? ? ? ? ? ? > >? ? ? ? ? ? ? > > > >? ? ? ? ? ? ? > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name="DOMINIOFAIL" nofailback="0" > ordered="0" restricted="1"> > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? name="BBDD" > revovery="restart"> > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? config_file="/etc/my.cnf" listen_address="" > mysql_options="" name="mydb" shutdown_wait="3"/> > > > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? monitor_link="1"/> > >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > >? ? ? ? ? ? ? > > > > > >Any idea? references? > > > >Thanks in advance > > > > > >Greetings > > > >ESG > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.zg at gmail.com Tue Feb 17 16:56:50 2009 From: alan.zg at gmail.com (Alan A) Date: Tue, 17 Feb 2009 10:56:50 -0600 Subject: [Linux-cluster] Failure to use fence_node with APC switches after 5.3 update Message-ID: I just uped everything to RHEL 5.3 and this is what I get when trying to "fence_node nodename": [root at fendev03 ~]# fence_node fendev04.xxxxxxxxx.com agent "fence_apc" reports: Traceback (most recent call last): File "/sbin/fence_apc", line 207, in ? main() File "/sbin/fence_apc", line 191, in main fence_action(conn, options, set_power_status, get_power_status) File "/usr/lib/fence/fencing.py", line 355, in fence_a agent "fence_apc" reports: ction status = get_power_fn(tn, options) File "/sbin/fence_apc", line 82, in get_power_status status = re.compile("\s*"+options["-n"]+"-.*(ON|OFF)", re.IGNORECASE).search(result).group(1) AttributeError: 'NoneType' object has no attribute 'group agent "fence_apc" reports: ' Any opinions on this? -- Alan A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaspar.castro at gmail.com Tue Feb 17 17:32:48 2009 From: gaspar.castro at gmail.com (Paulo Castro) Date: Tue, 17 Feb 2009 17:32:48 +0000 Subject: [Linux-cluster] du vs df vs gfs2_tool df /mountpoint Message-ID: <4e387ab30902170932l4d43d5b8u1cd4225a0fa47659@mail.gmail.com> Hi all. I have a GFS2 cluster of three machines using an iSCSI disk. Everything went fine on the initial tests and the cluster seems to work just great. A couple of days ago I submitted this cluster to a number of file creation operations and whilst this was providing enough load to see some performance data it was also inadvertently deleting some of the files being created for the purpose of the test whilst these files were still being written to. I though this shouldn't be a problem and that a simple fsck would be able to recover the space lost on those inodes. But that doesn't seem the case and I'll show why. DF output: Filesystem Size Used Avail Use% Mounted on /dev/mapper/gfsvg-gfslv 200G 200G 839M 100% /gfs du -sh output: [root at vmcluster1 gfs]# du -sh 101G . gfs2_tool output: [root at vmcluster1 gfs]# gfs2_tool df /gfs /gfs: SB lock proto = "lock_dlm" SB lock table = "iscsicluster:hd" SB ondisk format = 1801 SB multihost format = 1900 Block size = 4096 Journals = 3 Resource Groups = 800 Mounted lock proto = "lock_dlm" Mounted lock table = "iscsicluster:hd" Mounted host data = "jid=0:id=65537:first=1" Journal number = 0 Lock module flags = 0 Local flocks = FALSE Local caching = FALSE Type Total Used Free use% ------------------------------------------------------------------------ data 52423184 52208404 214780 100% inodes 215085 305 214780 0% As you can see by du's output there's only 101G being used and df's reporting the FS to be 100% used. I've run fsck.gf2 several times on one box on all the boxes but it just doesn't seem to be fixing this right. It reminds me of a post I saw https://bugzilla.redhat.com/show_bug.cgi?id=325151 . When I ran the first fsck I did saw a lot of issues being fixed and I could almost swear that the first time I mounted the cluster FS after repairing it was reporting the right size again. The next time I mounted it it all reverted back to what I show above. I'm using [root at vmcluster1 gfs]# gfs2_tool version gfs2_tool 0.1.44 (built Jul 6 2008 10:57:30) Copyright (C) Red Hat, Inc. 2004-2006 All rights reserved. Anyone with a similar issue ?! Cheers, PECastro -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gary_Hunt at gallup.com Tue Feb 17 20:12:02 2009 From: Gary_Hunt at gallup.com (Hunt, Gary) Date: Tue, 17 Feb 2009 14:12:02 -0600 Subject: [Linux-cluster] Quorum disk Message-ID: Having an issue with my 2 node cluster. Think it is related to the quorum disk. 2 node RHEL 5.3 cluster with quorum disk. Virtual servers running on each node. Whenever node1 takes over the master role in qdisk it looses quorum and restarts all the virtual servers. It does regain quorum a few seconds later. If node1 is already the master and I fail node2; things work as expected. Node2 doesn't seem to have a problem taking over master role. Whenever node1 needs to take over master role the cluster looses quorum. Here is my cluster.conf. Any suggestions on what may be causing this? Thanks Gary -------------- next part -------------- An HTML attachment was scrubbed... URL: From billpp at gmail.com Tue Feb 17 21:19:10 2009 From: billpp at gmail.com (Flavio Junior) Date: Tue, 17 Feb 2009 18:19:10 -0300 Subject: [Linux-cluster] Clustering File Services In-Reply-To: <4d87a25f0902161436i49184150g84ba46bbfe31a1f7@mail.gmail.com> References: <4d87a25f0902161436i49184150g84ba46bbfe31a1f7@mail.gmail.com> Message-ID: <58aa8d780902171319x2243221dwe05561e52ec6ac5a@mail.gmail.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Ken... CTDB has a non-official channel for discussion under irc.freenode.net #ctdb btw, RHCS has a channel under the same server as #linux-cluster Some developers, advanced and newbie users are there for discussion this nice thing :) I'm using redhat cluster-suite and ctdb, with GFS2 and GFS1 without problems... If you can, go chat there with us. If you cant, post information about your problem and we try to help. - -- Fl?vio do Carmo J?nior aka waKKu -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: http://getfiregpg.org iEYEARECAAYFAkmbKdcACgkQgyuXjr6dyktxwACg0wGBgK4I+iTVegvBjdu7h1qP wpsAoM7/rSfW+DtZ9H0/3DkA0uH53YJ5 =D/Fd -----END PGP SIGNATURE----- 2009/2/16 Ken Lupo : > For those of you using Clustering and File Services, specifically Samba how > you're doing it? Are you using CTDB or some other variation or Samba. I've > been trying to setup CTDB on a clustered gfs2 file system and have been > running into issues with the exact procedure for getting it running. Any > help would be greatly appreciated. > > Thank you, > Ken > > -- > Ken Lupo, Saline Area Schools > Office: 734.429.8014 > Mobile: 248.881.5681 > http://www.salineschools.com > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From alan.zg at gmail.com Tue Feb 17 22:35:54 2009 From: alan.zg at gmail.com (Alan A) Date: Tue, 17 Feb 2009 16:35:54 -0600 Subject: [Linux-cluster] scsi reservation issue with GFS Message-ID: Did anyone experience this? Any suggestions to fixing this error? Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009281 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009409 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009537 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009665 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009793 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009921 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11010049 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11010177 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11010305 -- Alan A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohara at redhat.com Tue Feb 17 22:48:48 2009 From: rohara at redhat.com (Ryan O'Hara) Date: Tue, 17 Feb 2009 16:48:48 -0600 Subject: [Linux-cluster] scsi reservation issue with GFS In-Reply-To: References: Message-ID: <20090217224848.GA10498@redhat.com> Can you dump the registered keys and reservation key? # get a list of keys resgitered for a device sg_persist -i -k # tells you which key holding the reservation sg_persist -i -r It appears that this node is trying to access /dev/sda but is not registered with the device. WERO reservations will only allow resgistered nodes to write to the disk. Is this RHEL4? RHEL5? On Tue, Feb 17, 2009 at 04:35:54PM -0600, Alan A wrote: > Did anyone experience this? Any suggestions to fixing this error? > > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11009281 > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11009409 > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11009537 > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11009665 > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11009793 > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11009921 > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11010049 > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11010177 > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict > Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = > 0x00000018 > Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector > 11010305 > > > -- > Alan A. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From garromo at us.ibm.com Tue Feb 17 23:23:20 2009 From: garromo at us.ibm.com (Gary Romo) Date: Tue, 17 Feb 2009 16:23:20 -0700 Subject: [Linux-cluster] scsi reservation issue with GFS In-Reply-To: Message-ID: We had this issue a long time ago. What we did was remove the sg3_utils rpm and then did a chkconfig scsi_reserve off Gary Alan A To Sent by: linux clustering linux-cluster-bou nces at redhat.com cc Subject 02/17/2009 03:35 [Linux-cluster] scsi reservation PM issue with GFS Please respond to linux clustering Did anyone experience this? Any suggestions to fixing this error? Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009281 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009409 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009537 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009665 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009793 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11009921 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11010049 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11010177 Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: reservation conflict Feb 17 16:32:18 fendev04 kernel: sd 0:0:0:0: SCSI error: return code = 0x00000018 Feb 17 16:32:18 fendev04 kernel: end_request: I/O error, dev sda, sector 11010305 -- Alan A.-- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic10684.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From rohara at redhat.com Tue Feb 17 23:37:46 2009 From: rohara at redhat.com (Ryan O'Hara) Date: Tue, 17 Feb 2009 17:37:46 -0600 Subject: [Linux-cluster] scsi reservation issue with GFS In-Reply-To: References: Message-ID: <20090217233746.GB10498@redhat.com> On Tue, Feb 17, 2009 at 04:23:20PM -0700, Gary Romo wrote: > > We had this issue a long time ago. > What we did was remove the sg3_utils rpm and then did a chkconfig > scsi_reserve off Ahh, yes. If you don't intend to use SCSI-3 reservations, you definitely need to turn off scsi_reserve. Thanks for pointing this out, Gary. From sdake at redhat.com Wed Feb 18 02:07:00 2009 From: sdake at redhat.com (Steven Dake) Date: Tue, 17 Feb 2009 19:07:00 -0700 Subject: [Linux-cluster] [patch] website about crypt.c In-Reply-To: <20090202.193354.42662687400854549.yamato@redhat.com> References: <4986A3A5.9090207@nsn.com> <001f01c98515$42eb8af0$c8c2a0d0$@gr> <20090202.193354.42662687400854549.yamato@redhat.com> Message-ID: <1234922820.5799.21.camel@sdake-laptop> Merged in whitetank and corosync trunk. Thanks again Also please copy openais at lists.osdl.org in the future since it makes tracking patches easier for me. -steve On Mon, 2009-02-02 at 19:33 +0900, Masatake YAMATO wrote: > It seems that libtomcrypt.org is moved. > > Masatake YAMATO > > > Index: exec/crypto.c > =================================================================== > --- exec/crypto.c (revision 1752) > +++ exec/crypto.c (working copy) > @@ -6,7 +6,7 @@ > * The library is free for all purposes without any express > * guarantee it works. > * > - * Tom St Denis, tomstdenis at iahu.ca, http://libtomcrypt.org > + * Tom St Denis, tomstdenis at iahu.ca, http://libtomcrypt.com/ > */ > #include > #include > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From sdake at redhat.com Wed Feb 18 02:12:07 2009 From: sdake at redhat.com (Steven Dake) Date: Tue, 17 Feb 2009 19:12:07 -0700 Subject: [Linux-cluster] [PATCH] use defined constant instead of raw literal in totemsrp.c In-Reply-To: <20090202.163414.938233730596300941.yamato@redhat.com> References: <20090130.170240.56227421577108127.yamato@redhat.com> <20090202.163414.938233730596300941.yamato@redhat.com> Message-ID: <1234923127.5799.22.camel@sdake-laptop> Merged whitetank and corosync trunk. Thanks again! On Mon, 2009-02-02 at 16:34 +0900, Masatake YAMATO wrote: > Could you apply this patch if appreciated? > > > Masatake YAMATO > > > Index: exec/totemsrp.c > =================================================================== > --- exec/totemsrp.c (revision 1752) > +++ exec/totemsrp.c (working copy) > @@ -1534,7 +1534,7 @@ > sizeof (struct iovec) * recovery_message_item->iov_len); > } else { > mcast = recovery_message_item->iovec[0].iov_base; > - if (mcast->header.encapsulated == 1) { > + if (mcast->header.encapsulated == MESSAGE_ENCAPSULATED) { > /* > * Message is a recovery message encapsulated > * in a new ring message > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From sdake at redhat.com Wed Feb 18 02:14:03 2009 From: sdake at redhat.com (Steven Dake) Date: Tue, 17 Feb 2009 19:14:03 -0700 Subject: [Linux-cluster] [PATCH] Remove redundant statement In-Reply-To: <20090130.170240.56227421577108127.yamato@redhat.com> References: <20090130.170240.56227421577108127.yamato@redhat.com> Message-ID: <1234923243.5799.23.camel@sdake-laptop> merged whitetank and corosync trunk. Sorry for the delay. Had outtage on main system with all my ssh keys. Regards -steve On Fri, 2009-01-30 at 17:02 +0900, Masatake YAMATO wrote: > I've found a redundant statement in totemsrp.c. > Could you apply following patch? > > > Masatake YAMATO > > Index: totemsrp.c > =================================================================== > --- totemsrp.c (revision 1752) > +++ totemsrp.c (working copy) > @@ -2608,7 +2608,6 @@ > orf_token.header.encapsulated = 0; > orf_token.header.nodeid = instance->my_id.addr[0].nodeid; > assert (orf_token.header.nodeid); > - orf_token.seq = 0; > orf_token.seq = SEQNO_START_MSG; > orf_token.token_seq = SEQNO_START_TOKEN; > orf_token.retrans_flg = 1; > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From sghosh at redhat.com Wed Feb 18 03:43:26 2009 From: sghosh at redhat.com (Subhendu Ghosh) Date: Tue, 17 Feb 2009 22:43:26 -0500 Subject: [Linux-cluster] [patch] website about crypt.c In-Reply-To: <1234922820.5799.21.camel@sdake-laptop> References: <4986A3A5.9090207@nsn.com> <001f01c98515$42eb8af0$c8c2a0d0$@gr> <20090202.193354.42662687400854549.yamato@redhat.com> <1234922820.5799.21.camel@sdake-laptop> Message-ID: <499B83DE.7090706@redhat.com> Are these websites correct? Steven Dake wrote: > Merged in whitetank and corosync trunk. > > Thanks again Also please copy openais at lists.osdl.org in the future > since it makes tracking patches easier for me. > > -steve > > On Mon, 2009-02-02 at 19:33 +0900, Masatake YAMATO wrote: >> It seems that libtomcrypt.org is moved. >> >> Masatake YAMATO >> >> >> Index: exec/crypto.c >> =================================================================== >> --- exec/crypto.c (revision 1752) >> +++ exec/crypto.c (working copy) >> @@ -6,7 +6,7 @@ >> * The library is free for all purposes without any express >> * guarantee it works. >> * >> - * Tom St Denis, tomstdenis at iahu.ca, http://libtomcrypt.org >> + * Tom St Denis, tomstdenis at iahu.ca, http://libtomcrypt.com/ >> */ >> #include >> #include >> >> -- > From sreejithemk at gmail.com Wed Feb 18 09:29:48 2009 From: sreejithemk at gmail.com (Sreejith K) Date: Wed, 18 Feb 2009 14:59:48 +0530 Subject: [Linux-cluster] Two node Cluster Message-ID: <3a60e0820902180129v56c9bd75yda6e64768b7210d1@mail.gmail.com> Hi, I want to setup a two node cluster using cman. here are the steps I followed ========================================= Cluster nodes: node1 & node2 ========================================= [root at node1 ~]# cman_tool status Version: 6.1.0 Config Version: 5 Cluster Name: k7 Cluster Id: 269 Cluster Member: Yes Cluster Generation: 48 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Quorum: 1 Active subsystems: 8 Flags: 2node Dirty Ports Bound: 0 11 Node name: node1 Node ID: 2 Multicast addresses: 239.192.1.14 Node addresses: 10.10.10.40 [root at node2 ~]# cman_tool status Version: 6.1.0 Config Version: 5 Cluster Name: k7 Cluster Id: 269 Cluster Member: Yes Cluster Generation: 48 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Quorum: 1 Active subsystems: 8 Flags: 2node Dirty Ports Bound: 0 11 Node name: node2 Node ID: 1 Multicast addresses: 239.192.1.14 Node addresses: 10.10.10.39 ========================================= Logical volumes on node node1 ========================================= /dev/vg1/lvol0 formatted as GFS [root at node1 ~]#gfs_mkfs -p lock-dlm -t k7:CLVM0 -j 2 /dev/vg1/lvol0 [root at node1 ~]#mkfs -t gfs -p lock-dlm -t k7:CLVM0 -j 2 /dev/vg1/lvol0 ========================================= Procedures on node node1 ========================================= [root at node1 ~]# service cman start Starting cluster: Enabling workaround for Xend bridged networking... done Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... done [ OK ] [root at node1 ~]# service clvmd start Starting clvmd: [ OK ] Activating VGs: 1 logical volume(s) in volume group "vg1" now active Error locking on node node2: Command timed out [ OK ] [root at node1 ~]# gnbd_serv -n gnbd_serv: startup succeeded [root at node1 ~]# gnbd_export -c -e gnbd1 -d /dev/vg1/lvol0 gnbd_export: created GNBD gnbd1 serving file /dev/vg1/lvol0 [root at node1 ~]# mount /dev/vg1/lvol0 /mnt/gfs_local/ [root at node1 ~]# cd /mnt/gfs_local/ [root at node1 gfs_local]# ls fence.css mantisbt-1.1.6.tar.gz vkarmalicense.lic JMeter.pdf mysql-cheat-sheet-v1.pdf [root at node1 gfs_local]# ========================================= Procedures on node node2 ========================================= [root at node2 ~]# service cman start Starting cluster: Enabling workaround for Xend bridged networking... done Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... done [ OK ] [root at node2 ~]# service clvmd start Starting clvmd: [ OK ] [root at node2 ~]# gnbd_serv -n gnbd_serv: startup succeeded [root at node2 ~]# gnbd_import -i 10.10.10.40 [root at node2 ~]# mount /dev/gnbd/gnbd1 -o lockproto=lock_dlm /mnt/gfs_gnbd/ [root at node2 ~]# cd /mnt/gfs_gnbd/ [root at node2 gfs_gnbd]# ls fence.css mantisbt-1.1.6.tar.gz vkarmalicense.lic JMeter.pdf mysql-cheat-sheet-v1.pdf My /etc/cluster/cluster.conf is Why do the message "Error locking on node node2: Command timed out" pops up every time ? Can a cluster be created without specifying fencing and all that stuff ? I just wanted to use GFS over a clustered environment..... Sreejith K K 7 - C O M P U T I N G www.k7computing.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcos.david at efacec.pt Wed Feb 18 10:16:37 2009 From: marcos.david at efacec.pt (Marcos David) Date: Wed, 18 Feb 2009 10:16:37 +0000 Subject: [Linux-cluster] Problems with IP addresses Message-ID: <499BE005.2010506@efacec.pt> Hi, I have a 4 node cluster, with two nic interfaces, one private (used for cluster purposes) and one public (used for external services) I have configure a service with two resources, an ip address and a script. My problem is that when the external NIC is Down, the cluster reports the following error: Feb 18 10:08:07 node4_pub clurgmgrd[4144]: status on ip "10.11.0.53" returned 2 (invalid argument(s)) Feb 18 10:08:07 node4_pub clurgmgrd[4144]: Stopping service service:test1 And the service is permanently stopped even the the external NIC comes up again. I've tried two different configurations for the service: