From riaan at obsidian.co.za Thu Jun 1 08:27:19 2006 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Thu, 1 Jun 2006 10:27:19 +0200 (SAST) Subject: [Linux-cluster] Choose between broadcast and multicast for cman Message-ID: Is there any reason or advantage to use multicast over broadcast, which offsets the complexity of multicast (relative to broadcast), e.g more control, less traffic, others? The documentation is not very helpful. The RHCS 4 manual just says "choose one". The RHCS 3 manual section 3.6.1 says "Multicast heartbeating over a channel-bonded Ethernet interface provides good fault tolerance and is recommended for availability." This looks more like a recommendation of channel-bonding versus standalone interfaces than recommending multicast over broadcast, since fault tolerance and availability are offered by channel-bonding, not multicast. -- Riaan van Niekerk Systems Architect Obsidian Red Hat Consulting Obsidian Systems www.obsidian.co.za Cel: +27 82 921 8768 Tel: +27 11 792 6500 Fax: +27 11 792 6522 From pcaulfie at redhat.com Thu Jun 1 08:43:45 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 01 Jun 2006 09:43:45 +0100 Subject: [Linux-cluster] Choose between broadcast and multicast for cman In-Reply-To: References: Message-ID: <447EA8C1.6010203@redhat.com> Riaan van Niekerk wrote: > Is there any reason or advantage to use multicast over broadcast, which > offsets the complexity of multicast (relative to broadcast), e.g more > control, less traffic, others? Generally multicast behaves the same as broadcast. The only time you would need multicast is if your nodes are on different subnets - in which case you would also have to make sure the router had sufficiently low latencies to support clustering over it. -- patrick From c_triantafillou at hotmail.com Thu Jun 1 10:57:32 2006 From: c_triantafillou at hotmail.com (Christos Triantafillou) Date: Thu, 01 Jun 2006 12:57:32 +0200 Subject: [Linux-cluster] DLM & RedHat Enterprise Linux In-Reply-To: <447DCBC7.9020803@redhat.com> Message-ID: >You mean that the users are using the default lockspace even >though the lockspace that was created by root was a different one? Strange. yes, that is what is happenning: I have got these devices: crwxrwxrwx 1 root root 10, 62 May 30 21:32 dlm-control crwxrwxrwx 1 root root 10, 61 May 30 21:32 dlm_default crwxrwxrwx 1 root root 10, 61 May 31 17:03 dlm_kobe and I can now run all the user tests as a non-root user: # lstest -o -r -l default Opening lockspace default locking LOCK-NAME EX ...ast called, status = 0, lkid=103a8 unlocking LOCK-NAME...ast called, status = 65538, lkid=103a8 but # lstest -o -l default Opening lockspace default locking LOCK-NAME EX ...ast called, status = 0, lkid=100cf unlocking LOCK-NAME...ast called, status = 65538, lkid=100cf Releasing ls default release ls: Operation not permitted Regards, Christos From pcaulfie at redhat.com Thu Jun 1 12:02:23 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 01 Jun 2006 13:02:23 +0100 Subject: [Linux-cluster] DLM & RedHat Enterprise Linux In-Reply-To: References: Message-ID: <447ED74F.1040601@redhat.com> Christos Triantafillou wrote: >> You mean that the users are using the default lockspace even >> though the lockspace that was created by root was a different one? >> Strange. > > yes, that is what is happenning: > I have got these devices: > crwxrwxrwx 1 root root 10, 62 May 30 21:32 dlm-control > crwxrwxrwx 1 root root 10, 61 May 30 21:32 dlm_default > crwxrwxrwx 1 root root 10, 61 May 31 17:03 dlm_kobe > > and I can now run all the user tests as a non-root user: > # lstest -o -r -l default > Opening lockspace default > locking LOCK-NAME EX ...ast called, status = 0, lkid=103a8 > unlocking LOCK-NAME...ast called, status = 65538, lkid=103a8 > > but > # lstest -o -l default > Opening lockspace default > locking LOCK-NAME EX ...ast called, status = 0, lkid=100cf > unlocking LOCK-NAME...ast called, status = 65538, lkid=100cf > Releasing ls default > release ls: Operation not permitted > It looks like the default lockspace didn't get released when the all references to it disappeared (unless you have something holding it open!). I'm not sure how that might happen -- patrick From riaan at obsidian.co.za Thu Jun 1 12:47:27 2006 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Thu, 1 Jun 2006 14:47:27 +0200 (SAST) Subject: [Linux-cluster] Choose between broadcast and multicast for cman In-Reply-To: Message-ID: On Thu, 1 Jun 2006, Riaan van Niekerk wrote: > Riaan van Niekerk wrote: > > Is there any reason or advantage to use multicast over broadcast, which > > offsets the complexity of multicast (relative to broadcast), e.g more > > control, less traffic, others? > > Generally multicast behaves the same as broadcast. The only time you would > need multicast is if your nodes are on different subnets - in which case > you > would also have to make sure the router had sufficiently low latencies to > support clustering over it. > > tnx Patrick The RHCS 4 documentation does not give a recommendation for a multicast address. cman man page and http://sources.redhat.com/cluster/doc/usage.txt mention 224.0.0.1, which is a non-routable multicast address range, 224.0.0.0/24 . So if I understand this correctly: Using this address or anything in the 224.0.0.0/24 range would give the exact same effect as long as nodes are on the same subnet. If nodes are on different subnets, a multicast address in another network should be used (e.g. RHCS 3 defaults to 225.0.0.11). Riaan From pcaulfie at redhat.com Thu Jun 1 13:08:23 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 01 Jun 2006 14:08:23 +0100 Subject: [Linux-cluster] Choose between broadcast and multicast for cman In-Reply-To: References: Message-ID: <447EE6C7.3040101@redhat.com> Riaan van Niekerk wrote: > On Thu, 1 Jun 2006, Riaan van Niekerk wrote: > >> Riaan van Niekerk wrote: >>> Is there any reason or advantage to use multicast over broadcast, which >>> offsets the complexity of multicast (relative to broadcast), e.g more >>> control, less traffic, others? >> Generally multicast behaves the same as broadcast. The only time you would >> need multicast is if your nodes are on different subnets - in which case >> you >> would also have to make sure the router had sufficiently low latencies to >> support clustering over it. >> >> > tnx Patrick > > > The RHCS 4 documentation does not give a recommendation for a multicast > address. cman man page and http://sources.redhat.com/cluster/doc/usage.txt > mention 224.0.0.1, which is a non-routable multicast address range, > 224.0.0.0/24 . > > So if I understand this correctly: > > Using this address or anything in the 224.0.0.0/24 range would give the > exact same effect as long as nodes are on the same subnet. If nodes are on > different subnets, a multicast address in another network should be used > (e.g. RHCS 3 defaults to 225.0.0.11). > Exactly. I don't know just how that mcast address got into the documentation but I suspect it's my fault. I do seem to remember giving someone a config file with that address in it at some time :-) -- patrick From riaan at obsidian.co.za Thu Jun 1 20:38:32 2006 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Thu, 1 Jun 2006 22:38:32 +0200 (SAST) Subject: [Linux-cluster] Oracle RAC and Cluster Suite, can they coexist on the same machine? In-Reply-To: Message-ID: On Thu, 1 Jun 2006, Riaan van Niekerk wrote: > Hello there, > > I would like to know if there is any problem when using the Oracle Real > Application Clusters to mange the Oracle database and Red Hat Cluster > Suite to manage an application on the same machine? > > Remember that the application will not be entirely managed by the RHCS, it > will be ative on all nodes and the RHCS will only be used to manage > virtual IP reallocation and mounting points. > > Is there any problems reported so far with this configuration? > > Does Oracle RAC and RHCS can coexist without problems? > > Any comments on this type of configuration? > > I have not run a setup like this, but I do not see this as being a problem. You do not give version numbers for any of the RHCS or RAC, nor if GFS is involved, but if you are running GFS 6.1/RHEL 4, you are already running RHCS. Either way, as long as you do not do anything to make scripts, mountpoints or virtual IPs/ports clash between RHCS and RAC, you should be fine, IMHO. Riaan From Klaus.Steinberger at physik.uni-muenchen.de Fri Jun 2 05:54:28 2006 From: Klaus.Steinberger at physik.uni-muenchen.de (Klaus Steinberger) Date: Fri, 2 Jun 2006 07:54:28 +0200 Subject: [Linux-cluster] GFS - lost filespace during gfs_grow Message-ID: <200606020754.28501.Klaus.Steinberger@physik.uni-muenchen.de> Hello, I have the following problem: I tried to expand a GFS filesystem from 2 TByte to 3 TByte. FIrst I expanded successfully the Logical Volume (sits on a FC storage) Then I tried "gfs_grow -v /export/data/etp". The last thing it wrote out: Preparing to write new FS information After that the load at least on one of the other nodes running the NFS Service has gone up (80 - 130), I did not see any big activity on the storage, but DLM lock events on the node running gfs_grow. After some long time (around 20 Minutes) the node running gfs_grow crashed with an OOPS ( please see the screenshot at http://www.physik.uni-muenchen.de/~klaus.steinberger/crash-dlm.png ). With df it looks like that only part of the new space was added: [root at etpopt03 ~]# df /export/data/etp Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/etpdata-etp 2427166768 2105093784 322072984 87% /export/data/etp [root at etpopt03 ~]# Further gfs_grow commands tell: [root at etpopt03 ~]# gfs_grow -Tv /export/data/etp Device has grown by less than 100 blocks.... skipping [root at etpopt03 ~]# There are 8 journals with standard size (so at most 128 Mbyte should be used for the journals), so it looks like around 500 - 600 MByte are missing. I run Scientific Linux 4.2 (which is similar to RHEL 4.2) How could I recover the lost space? Sincerly, Klaus -- Klaus Steinberger Maier-Leibnitz Labor Phone: (+49 89)289 14287 Am Coulombwall 6, D-85748 Garching, Germany FAX: (+49 89)289 14280 EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE URL: http://www.physik.uni-muenchen.de/~k2/ In a world without Walls and Fences, who needs Windows and Gates From Olivier.Thibault at lmpt.univ-tours.fr Fri Jun 2 08:26:17 2006 From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault) Date: Fri, 02 Jun 2006 10:26:17 +0200 Subject: [Linux-cluster] gfs export over nfs is very slow In-Reply-To: <447C633C.1090508@lmpt.univ-tours.fr> References: <20060530145918.9049.qmail@webmail46.rediffmail.com> <447C633C.1090508@lmpt.univ-tours.fr> Message-ID: <447FF629.6020307@lmpt.univ-tours.fr> Hi, I have upgraded FC5, and it's now much better. For information, here is a bonnie++ test result, on gfs exported via nfs, gigabit ethernet lan. Version 1.01d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP poisson 4G 21383 19 21582 6 4026 75 24101 21 22974 3 259.8 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 158 1 517 91 248 2 157 1 3243 22 238 2 Locally, the same test is more than twice faster. Does someone knows if there are optimizations for gfs and nfs, other than ones found in NFS Howto ? Best regards, Olivier Olivier Thibault a ?crit : > Hi, > > Raj Kumar a ?crit : > > Hi, > > > > We are using GFS6.0 (no cluster suite) and NFS exports of the file > system. I am getting a transfer rate of about 35MB/sec. We have a high > speed SAN. Actually the transfer rate can be little higher but we > attribute the slow rate to NFS itself since we see the same numbers for > EXT3 also. > > > > Regards, > > Raj > > > > > > Thank you for your answer. > I am upgrading to last GFS/DLM/CMAN kernel stuff and will retry. > I've ran bonnie++ with ext3 exported over nfs and it is really speeder > even if it's not what i expected. I got about 22 MB/s (r/w). > But i saw that nfsd was consuming a lot of CPU. The system load was 15 !! > I've also ran test with Suse SLES9 xfs exported over nfs. I got 40MB/s, > which is what aim to get with GFS ... > I don't understand ... > > Is there anybody who export gfs over nfs with FC5 ? > > Thanks by advance > > Olivier > >> On Tue, 30 May 2006 Olivier Thibault wrote : >>> Hi, >>> >>> >>> I am testing RHCS on Fedora Core 5. >>> I have a shared gfs volume mounted on two nodes (using clvmd and >>> lock_dlm). >>> Locally, everything is ok. >>> If I export the gfs volume via nfs, i obtain *very poor* performance. >>> For exemple, from a nfs client with dd, it take 90 seconds to create >>> a 16 MB file !!! >>> From the cluster's nodes, the performances a good, and i made some >>> tests exporting xfs over nfs, and it was good too. >>> So what's wrong with nfs+gfs ? >>> I would be very interested to know how guys who use this have >>> configured it, and what performances they have. >>> >>> Thanks for any advices. >>> >>> Best regards >>> >>> -- Olivier THIBAULT >>> Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083) >>> Universit? Fran?ois Rabelais >>> Parc de Grandmont - 37200 TOURS >>> T?l: +33 2 47 36 69 12 >>> Fax: +33 2 47 36 69 56 >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> ------------------------------------------------------------------------ >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > From c_triantafillou at hotmail.com Fri Jun 2 13:28:51 2006 From: c_triantafillou at hotmail.com (Christos Triantafillou) Date: Fri, 02 Jun 2006 15:28:51 +0200 Subject: [Linux-cluster] DLM & RedHat Enterprise Linux In-Reply-To: <447ED74F.1040601@redhat.com> Message-ID: Patrick, Is the query functionality implemented yet? I defined QUERY in dlmtest.c and I am getting this: # dlmtest -Q locking LOCK-NAME EX ...done (lkid = 102da) Query failed: Invalid argument unlocking LOCK-NAME...done Regards, Christos >From: Patrick Caulfield >Reply-To: linux clustering >To: linux clustering >Subject: Re: [Linux-cluster] DLM & RedHat Enterprise Linux >Date: Thu, 01 Jun 2006 13:02:23 +0100 >MIME-Version: 1.0 >Received: from hormel.redhat.com ([209.132.177.30]) by >bay0-mc4-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 1 >Jun 2006 05:02:31 -0700 >Received: from listman.util.phx.redhat.com (listman.util.phx.redhat.com >[10.8.4.110])by hormel.redhat.com (Postfix) with ESMTPid E91BC72E86; Thu, >1 Jun 2006 08:02:28 -0400 (EDT) >Received: from int-mx1.corp.redhat.com >(int-mx1.corp.redhat.com[172.16.52.254])by listman.util.phx.redhat.com >(8.13.1/8.13.1) with ESMTP idk51C2RXT025641 for >;Thu, 1 Jun 2006 08:02:27 -0400 >Received: from pobox.surrey.redhat.com (pobox.surrey.redhat.com >[172.16.10.17])by int-mx1.corp.redhat.com (8.12.11.20060308/8.12.11) with >ESMTP idk51C2QqF024437 for ;Thu, 1 >Jun 2006 08:02:26 -0400 >Received: from [192.168.1.2] (vpn-68-1.surrey.redhat.com [10.32.68.1])by >pobox.surrey.redhat.com (8.12.11.20060308/8.12.11) with ESMTP >idk51C2ODh016194for ; Thu, 1 Jun 2006 13:02:25 >+0100 >X-Message-Info: LsUYwwHHNt1hwMoPuwvRWIu68qUsjYIZZ2SgBHK6+k0= >Organization: Red Hat >User-Agent: Thunderbird 1.5 (X11/20051201) >References: >X-Enigmail-Version: 0.94.0.0 >X-loop: linux-cluster at redhat.com >X-BeenThere: linux-cluster at redhat.com >X-Mailman-Version: 2.1.5 >Precedence: junk >List-Id: linux clustering >List-Unsubscribe: >, >List-Archive: >List-Post: >List-Help: >List-Subscribe: >, >Errors-To: linux-cluster-bounces at redhat.com >Return-Path: linux-cluster-bounces at redhat.com >X-OriginalArrivalTime: 01 Jun 2006 12:02:32.0305 (UTC) >FILETIME=[43464610:01C68573] > >Christos Triantafillou wrote: > >> You mean that the users are using the default lockspace even > >> though the lockspace that was created by root was a different one? > >> Strange. > > > > yes, that is what is happenning: > > I have got these devices: > > crwxrwxrwx 1 root root 10, 62 May 30 21:32 dlm-control > > crwxrwxrwx 1 root root 10, 61 May 30 21:32 dlm_default > > crwxrwxrwx 1 root root 10, 61 May 31 17:03 dlm_kobe > > > > and I can now run all the user tests as a non-root user: > > # lstest -o -r -l default > > Opening lockspace default > > locking LOCK-NAME EX ...ast called, status = 0, lkid=103a8 > > unlocking LOCK-NAME...ast called, status = 65538, lkid=103a8 > > > > but > > # lstest -o -l default > > Opening lockspace default > > locking LOCK-NAME EX ...ast called, status = 0, lkid=100cf > > unlocking LOCK-NAME...ast called, status = 65538, lkid=100cf > > Releasing ls default > > release ls: Operation not permitted > > > >It looks like the default lockspace didn't get released when the all >references to it disappeared (unless you have something holding it open!). > >I'm not sure how that might happen > >-- > >patrick > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster From pcaulfie at redhat.com Fri Jun 2 13:40:36 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 02 Jun 2006 14:40:36 +0100 Subject: [Linux-cluster] DLM & RedHat Enterprise Linux In-Reply-To: References: Message-ID: <44803FD4.7060209@redhat.com> Christos Triantafillou wrote: > Patrick, > > Is the query functionality implemented yet? > I defined QUERY in dlmtest.c and I am getting this: > # dlmtest -Q > locking LOCK-NAME EX ...done (lkid = 102da) > Query failed: Invalid argument > unlocking LOCK-NAME...done > It's implemented in RHEL4 & STABLE, but not in the new (GFS2 stream) code on CVS head. If you had to define QUERY, then it sounds like you're using CVS head. -- patrick From c_triantafillou at hotmail.com Fri Jun 2 15:36:54 2006 From: c_triantafillou at hotmail.com (Christos Triantafillou) Date: Fri, 02 Jun 2006 17:36:54 +0200 Subject: [Linux-cluster] DLM & RedHat Enterprise Linux In-Reply-To: <44803FD4.7060209@redhat.com> Message-ID: >It's implemented in RHEL4 & STABLE, but not in the new (GFS2 stream) code >on >CVS head. > >If you had to define QUERY, then it sounds like you're using CVS head. I had to define QUERY because it is tested in libdlm.h from the cluster source. How can I get the stable version/headers? From pcaulfie at redhat.com Fri Jun 2 15:46:50 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 02 Jun 2006 16:46:50 +0100 Subject: [Linux-cluster] DLM & RedHat Enterprise Linux In-Reply-To: References: Message-ID: <44805D6A.4090305@redhat.com> Christos Triantafillou wrote: >> It's implemented in RHEL4 & STABLE, but not in the new (GFS2 stream) >> code on >> CVS head. >> >> If you had to define QUERY, then it sounds like you're using CVS head. > > I had to define QUERY because it is tested in libdlm.h from the cluster > source. > > How can I get the stable version/headers? > Checkout from CVS tag STABLE or download the tarball from sources.redhat.com. But if you're also using cman & dlm from CVS head, the libdlm from STABLE won't work with it. They must match as the code in HEAD is largely a rewrite. -- patrick From sdake at redhat.com Thu Jun 1 13:51:24 2006 From: sdake at redhat.com (Steven Dake) Date: Thu, 01 Jun 2006 06:51:24 -0700 Subject: [Linux-cluster] Choose between broadcast and multicast for cman In-Reply-To: <447EA8C1.6010203@redhat.com> References: <447EA8C1.6010203@redhat.com> Message-ID: <1149169884.21510.2.camel@shih.broked.org> On Thu, 2006-06-01 at 09:43 +0100, Patrick Caulfield wrote: > Riaan van Niekerk wrote: > > Is there any reason or advantage to use multicast over broadcast, which > > offsets the complexity of multicast (relative to broadcast), e.g more > > control, less traffic, others? > > Generally multicast behaves the same as broadcast. The only time you would > need multicast is if your nodes are on different subnets - in which case you > would also have to make sure the router had sufficiently low latencies to > support clustering over it. > > > To add to Patrick's comments, some multicast switches (managed) support IGMP which allows multicasted packets to only be sent to ports which are subscribed to a specific multicast address. This can increase throughput on those nodes that are not part of the cluster and hence shouldn't be receiving those broadcasts. Keep in mind that some switches IGMP implementation is broken. Regards -steve From rajkum2002 at rediffmail.com Fri Jun 2 21:39:31 2006 From: rajkum2002 at rediffmail.com (Raj Kumar) Date: 2 Jun 2006 21:39:31 -0000 Subject: [Linux-cluster] gfs export over nfs is very slow Message-ID: <20060602213931.19959.qmail@webmail49.rediffmail.com> Hi, I found that we are also experiencing performance issues using GFS. Earlier we thought EXT3 and GFS were performing equally so NFS is the issue. But I found that some tests were done incorrectly and EXT3 over NFS is twice faster than GFS over NFS. We formatted a SAN volume as EXT3 and benchmarked it on NFS client. We formatted the same SAN volume as GFS and benchmarked again. GFS + NFS is very slow. I have also read NFS tuning guides and tried several options. But no change whatsoever. Are there any other ways to debug this issue. It's a top most priority for us now. Thanks, Raj On Fri, 02 Jun 2006 Olivier Thibault wrote : >Hi, > >I have upgraded FC5, and it's now much better. >For information, here is a bonnie++ test result, on gfs exported via nfs, gigabit ethernet lan. >Version 1.01d ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- >Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP >poisson 4G 21383 19 21582 6 4026 75 24101 21 22974 3 259.8 1 > ------Sequential Create------ --------Random Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 158 1 517 91 248 2 157 1 3243 22 238 2 > > >Locally, the same test is more than twice faster. >Does someone knows if there are optimizations for gfs and nfs, other than ones found in NFS Howto ? > >Best regards, > >Olivier > >Olivier Thibault a ?crit : >>Hi, >> >>Raj Kumar a ?crit : >> > Hi, >> > >> > We are using GFS6.0 (no cluster suite) and NFS exports of the file system. I am getting a transfer rate of about 35MB/sec. We have a high speed SAN. Actually the transfer rate can be little higher but we attribute the slow rate to NFS itself since we see the same numbers for EXT3 also. >> > >> > Regards, >> > Raj >> > >> > >> >>Thank you for your answer. >>I am upgrading to last GFS/DLM/CMAN kernel stuff and will retry. >>I've ran bonnie++ with ext3 exported over nfs and it is really speeder even if it's not what i expected. I got about 22 MB/s (r/w). >>But i saw that nfsd was consuming a lot of CPU. The system load was 15 !! >>I've also ran test with Suse SLES9 xfs exported over nfs. I got 40MB/s, which is what aim to get with GFS ... >>I don't understand ... >> >>Is there anybody who export gfs over nfs with FC5 ? >> >>Thanks by advance >> >>Olivier >> >>>On Tue, 30 May 2006 Olivier Thibault wrote : >>>>Hi, >>>> >>>> >>>>I am testing RHCS on Fedora Core 5. >>>>I have a shared gfs volume mounted on two nodes (using clvmd and lock_dlm). >>>>Locally, everything is ok. >>>>If I export the gfs volume via nfs, i obtain *very poor* performance. >>>>For exemple, from a nfs client with dd, it take 90 seconds to create a 16 MB file !!! >>>> From the cluster's nodes, the performances a good, and i made some tests exporting xfs over nfs, and it was good too. >>>>So what's wrong with nfs+gfs ? >>>>I would be very interested to know how guys who use this have configured it, and what performances they have. >>>> >>>>Thanks for any advices. >>>> >>>>Best regards >>>> >>>>-- Olivier THIBAULT >>>>Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083) >>>>Universit? Fran?ois Rabelais >>>>Parc de Grandmont - 37200 TOURS >>>>T?l: +33 2 47 36 69 12 >>>>Fax: +33 2 47 36 69 56 >>>> >>>>-- Linux-cluster mailing list >>>>Linux-cluster at redhat.com >>>>https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>>------------------------------------------------------------------------ >>> >>>-- Linux-cluster mailing list >>>Linux-cluster at redhat.com >>>https://www.redhat.com/mailman/listinfo/linux-cluster >> > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From wcheng at redhat.com Sat Jun 3 00:22:54 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Fri, 02 Jun 2006 20:22:54 -0400 Subject: [Linux-cluster] gfs export over nfs is very slow In-Reply-To: <20060602213931.19959.qmail@webmail49.rediffmail.com> References: <20060602213931.19959.qmail@webmail49.rediffmail.com> Message-ID: <1149294175.5424.42.camel@localhost.localdomain> Try these: 1) Note that 2.6 kernel defaults NFS export option to "sync". So unless you have a strong need, explicitly set NFS export to "async" in your export file (/etc/exports in Red Hat systems) and do *not* mount NFS shares with "sync" option. 2) Upon large NFS append (i.e. "write" that will increase file size), using bigger block size (e.g. when using dd command) and/or bigger application buffer AND increasing NFS wsize and rsize (mount option) to its maximum, *if* you have to use "sync" in as either export or mount option. The problem here could be GFS sync due to its cluster filesystem nature and its file sync design. So try to avoid it if possible. Let us know how it goes. -- Wendy From carl at e2-media.co.nz Sun Jun 4 01:38:33 2006 From: carl at e2-media.co.nz (Carl Bowden) Date: Sun, 4 Jun 2006 13:38:33 +1200 Subject: [Linux-cluster] cluster.conf Documentation/DTD Message-ID: Hi, Is there any Documentation, other than the cluster.conf(8) man page on the cluster.conf XML I'm specifily looking to find out how to declare a 'Fence Domain' name in the cluster and to check what is 'valid' for the cluster.conf file At this stage this is the only options for the element I have seen: or is this in-fact a silly thing to try and do? any pointer to some more info would be very helpful Cheers, Carl. "To understand recursion, you must first understand recursion". ---------------------------------- Carl Bowden carl at e2-media.co.nz e2media Ltd 2nd Floor, 160 Cashel St PO BOX 22 128 Christchurch New Zealand Ph +64 3 377 0007 Fx +64 3 377 6582 M +021 338 410 From Tomasz.Koczorowski at centertel.pl Mon Jun 5 07:56:38 2006 From: Tomasz.Koczorowski at centertel.pl (Tomasz Koczorowski) Date: Mon, 5 Jun 2006 09:56:38 +0200 Subject: [Linux-cluster] Failed service Message-ID: <7E36A75EF0ABE243954B6E5AFA35D86645B717@EXCH-BK.centertel.main> Hi, I have encountered following problem: developers tried to upgrade clustered application on RHCS4 without stopping related service. Unfortunately while the application was stopped, cluster tried to execute start script with status parameter - it failed. After that cluster tried to relocate the service but stopping failed (beacuse service was already stopped) and the service state was changed to failed (shared filesystem and ip address were not removed from node). Later developers started the service (without notifying CS). So now application is running but service state is failed. I can enable the service but I will have to execute following commands on failed node thus stopping important application: clusvcadm -d my_service clusvcadm -e my_service Is there a possibility to enable failed service without stopping and starting it? Regards, Tomasz Koczorowski From bill.scherer at verizonwireless.com Mon Jun 5 18:31:22 2006 From: bill.scherer at verizonwireless.com (Bill Scherer) Date: Mon, 05 Jun 2006 14:31:22 -0400 Subject: [Linux-cluster] Shared Filesystem In-Reply-To: References: Message-ID: <4484787A.2050800@verizonwireless.com> Joe Warren-Meeks wrote: > > Hey there, > > I've got an iscsi array, on which I have a load of content to be > served via http. I basically want to have a bunch of linux boxes > mount the same partition read-only, via iscsi, so that they can do this. > > Now, I know I need GFS > Why? You said it's read only. NFS can handle this, and it's a lot easier to set up and maintain. > and one of the locking daemons (possibly > gulm?), but do I need anything else from the cluster suite? > > Anyone got any pointers on where to look for this info? I don't want > to set up a full-blown cluster, just share one partition between > multiple machines. > > Cheers! > > -- joe. > > Joe Warren-Meeks T: +44 (0) 208 962 0007 > Aggregator Limited M: +44 (0) 7789 176078 > Unit 62/63 Pall Mall Deposit E: joe.warren-meeks at aggregator.tv > 124-128 Barlby Road, London W10 6BL > PGP Fingerprint: 361F 78D0 56F5 8D7F 2639 947D 71E2 8811 F825 64CC > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From joe.warren-meeks at aggregator.tv Mon Jun 5 21:21:52 2006 From: joe.warren-meeks at aggregator.tv (Joe Warren-Meeks) Date: Mon, 5 Jun 2006 22:21:52 +0100 Subject: [Linux-cluster] Shared Filesystem In-Reply-To: <4484787A.2050800@verizonwireless.com> References: <4484787A.2050800@verizonwireless.com> Message-ID: On 5 Jun 2006, at 19:31, Bill Scherer wrote: Hey there, > Why? You said it's read only. NFS can handle this, and it's a lot > easier to set up and maintain. Mainly scalability and performance. I've done NFS based systems before and they've had problems once you pass a certain number of clients unless you spend a fortune on NetApps or the equivalent. Looks like I'm going to go with Storagetek or netapps though. I'd rather have used something like the Equallogic iscsi box, though. -- joe. Joe Warren-Meeks T: +44 (0) 208 962 0007 Aggregator Limited M: +44 (0) 7789 176078 Unit 62/63 Pall Mall Deposit E: joe.warren-meeks at aggregator.tv 124-128 Barlby Road, London W10 6BL PGP Fingerprint: 361F 78D0 56F5 8D7F 2639 947D 71E2 8811 F825 64CC From rick at espresolutions.com Mon Jun 5 22:59:23 2006 From: rick at espresolutions.com (Rick Bansal) Date: Mon, 5 Jun 2006 17:59:23 -0500 Subject: [Linux-cluster] Configuring cluster for direct routing Message-ID: <200606060310.k563A9Xe032719@mx1.redhat.com> Hello, Has anyone setup a RH cluster using direct routing. I read on the RH site that DR is not offically supported but can be done. If anyone has any insight into this, your help would be greatly appreciated. Regards, Rick Bansal From riaan at obsidian.co.za Tue Jun 6 08:22:51 2006 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Tue, 6 Jun 2006 10:22:51 +0200 (SAST) Subject: [Linux-cluster] Configuring cluster for direct routing In-Reply-To: <200606060310.k563A9Xe032719@mx1.redhat.com> Message-ID: On Mon, 5 Jun 2006, Rick Bansal wrote: > Hello, > > Has anyone setup a RH cluster using direct routing. I read on the RH site > that DR is not offically supported but can be done. If anyone has any > insight into this, your help would be greatly appreciated. > > Regards, > Rick Bansal > > Where do you read that it is not supported? I remember reading something like that, but cannot find it in the RHCS manual. I have not set up Direct Routing in IPVS mysel, but this Red Hat Magazine Tips & Tricks http://www.redhat.com/magazine/014dec05/departments/tips_tricks/ , second item, contains a nice writeup by Lon on how to do it. It does not say anything about being supported or not. I would contact Red Hat Global Support Services and ask if it is supported. The SLA for Cluster Suite does not go into that level of detail (even though it explicitly excludes manual fencing for production workloads): http://www.redhat.com/support/service/sla/defs_cluster/ha.html Riaan From ben.yarwood at juno.co.uk Tue Jun 6 10:16:46 2006 From: ben.yarwood at juno.co.uk (Ben Yarwood) Date: Tue, 6 Jun 2006 11:16:46 +0100 Subject: [Linux-cluster] Backup File System Message-ID: <0de101c68952$52375780$3964a8c0@WS076> Presently we have a 3 node production GFS system for which I am creating a backup system. The production system has a hardware raid device attached as the shared storage and I have a second virtually identical storage device and host that I want to use for the backup. The backup device and host will eventually be at a separate physical location to the production system (once the two systems are syncronised) so I don't want it to be part of the existing cluster. In the event of failure of the production storage, the simplest solution would be to physically transfer the backup storage to the production environment and swap the devices. Can anyone suggest the best way to set up the backup file system? If I used a one node cluster as the backup, would it be a) possible to convert the locking from nolock to dlm? and b) Convert the ClusterName and FSName Is it possible to convert other file systems to GFS ones so that no cluster infrastructure is needed for the backup? Regards Ben From riaan at obsidian.co.za Tue Jun 6 13:23:01 2006 From: riaan at obsidian.co.za (riaan at obsidian.co.za) Date: Tue, 06 Jun 2006 15:23:01 +0200 Subject: [Linux-cluster] post_fail_delay Message-ID: <20060606152301.qrmkr58ls8kcs4ss@web.obsidianonline.net> Having researched post_fail_delay in the archives extensively, I have the following scenario and question: I would like for an errant GFS node to be able to create network/disk dumps before being power fenced. Am I missing something, or is this leaving the errand node unfenced for any significant amount of time (enough to complete the dump, assuming it is upwards of a few seconds) just a bad idea? AFAIUnderstand, the whole idea of fencing is to prevent the node from damaging the file system in the first place, making the collection of dumps and power fencing fundamentally at odds with each other. The only way I can see fencing/dumping being used togeather is with I/O fencing (and I/O fencing alone, e.g. no power fencing as a second level). The cluster I/O fences the node immediately, but it remains up to be able to complete the dump. Recovery entails rebooting & re-enabling the port (all manual). However, post_fail_delay is still set to 0. To summarize, as I see it (please feel free to correct) To ensure data integrity: - Always use a post_fail_delay of 0, whether you are using power or I/O fencing. - When using power fencing (alone or with I/O fencing), you cannot use netdump/diskdump - otherwise the server will be fenced (rebooted) before being able to complete the dump. - When you must have the ability to netdump/diskdump, use I/O fencing (and only I/O fencing), and time the manual restore/unfence so that the dump has time to complete tnx Riaan ---------------------------------------------------------------- This message was sent using Obsidian Online web-mail. Obsidian Online - a division of Obsidian Systems (Pty) Ltd. http://www.obsidianonline.net/ From rick at espresolutions.com Tue Jun 6 15:20:28 2006 From: rick at espresolutions.com (Rick Bansal) Date: Tue, 6 Jun 2006 10:20:28 -0500 Subject: [Linux-cluster] Configuring cluster for direct routing In-Reply-To: Message-ID: <200606061520.k56FKdhs020198@mx1.redhat.com> Thanks for the response. Can't remember where I read it either. I'm trying to find it again. I'll provide the link once I find it. Regards, Rick -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Riaan van Niekerk Sent: Tuesday, June 06, 2006 3:23 AM To: linux clustering Subject: Re: [Linux-cluster] Configuring cluster for direct routing On Mon, 5 Jun 2006, Rick Bansal wrote: > Hello, > > Has anyone setup a RH cluster using direct routing. I read on the RH site > that DR is not offically supported but can be done. If anyone has any > insight into this, your help would be greatly appreciated. > > Regards, > Rick Bansal > > Where do you read that it is not supported? I remember reading something like that, but cannot find it in the RHCS manual. I have not set up Direct Routing in IPVS mysel, but this Red Hat Magazine Tips & Tricks http://www.redhat.com/magazine/014dec05/departments/tips_tricks/ , second item, contains a nice writeup by Lon on how to do it. It does not say anything about being supported or not. I would contact Red Hat Global Support Services and ask if it is supported. The SLA for Cluster Suite does not go into that level of detail (even though it explicitly excludes manual fencing for production workloads): http://www.redhat.com/support/service/sla/defs_cluster/ha.html Riaan -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From mathieu.avila at seanodes.com Tue Jun 6 16:38:31 2006 From: mathieu.avila at seanodes.com (Mathieu Avila) Date: Tue, 06 Jun 2006 18:38:31 +0200 Subject: [Linux-cluster] Is fenced service started ? Message-ID: <4485AF87.7090609@seanodes.com> Hi all, I am trying to automate the starting and stopping of a GFS filesystem (GFS 6.1). I am doing these things : - On start : /etc/init.d/ccsd start /etc/init.d/cman start /etc/init.d/fenced start /etc/init.d/gfs start And then mount -t gfs device mountpoint - On stop : umount device, /etc/init.d/gfs stop /etc/init.d/fenced stop /etc/init.d/cman stop /etc/init.d/ccsd stop This goes fine most of the time, but not always. Sometimes I get things like this: "lock_dlm: fence domain not found; check fenced" in syslog at mount time, although /etc/init.d/fenced was properly started. In fact, the fence daemon did not have enough time to initialize itself completely (/etc/cluster/services). The same can happen if i start immediately after a stop, as the fencing daemon does not have time to completely exit when i try to run it again. Is there a clean way to test if fenced is completely started or failed ? Looping over /etc/cluster/services does not sound appropriate and quite clean. Doing a "sleep 10" is not a good option neither. Any idea is welcome. -- Mathieu Avila From Matthew.Patton.ctr at osd.mil Tue Jun 6 18:28:48 2006 From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E) Date: Tue, 6 Jun 2006 14:28:48 -0400 Subject: [Linux-cluster] Is fenced service started ? Message-ID: Classification: UNCLASSIFIED > /etc/init.d/fenced start / stop neither should return until they are actually done. For example in the DNS start/stop script there is a sleep which is really sad way to go about it but works for now. I find the quality of the start/stop scripts in general to be summer-intern grade. Can't comment on how good or bad other distro's are but maybe they are all similarly flakey. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 2212 bytes Desc: not available URL: From rpeterso at redhat.com Tue Jun 6 18:51:19 2006 From: rpeterso at redhat.com (Robert S Peterson) Date: Tue, 06 Jun 2006 13:51:19 -0500 Subject: [Linux-cluster] [GFS2] gfs2 utils now available (experimental) Message-ID: <1149619879.6183.50.camel@technetium.msp.redhat.com> Hi Folks, For anyone who wants to start doing preliminary playing with GFS2: This morning, I finished my first version of the user-land tools for the new GFS2 filesystem and made them available in Red Hat's public CVS repository. Feel free to review them and/or try them out. (See warning at the bottom). The tools are as follows (with some comments): 1. libgfs2 This is a new library that the other tools rely upon and link against. In GFS1, each tool had its own way of doing things, and that was prone to mistakes. Now the tools all use a standard library of gfs2 functions, and more problems can be fixed in one place rather than many. 2. gfs2_convert This tool allows you to convert a gfs1 filesystem to gfs2 format. There are some minor differences between the gfs1 and gfs2 on-disk format that allows gfs2 to have better performance. So we wrote a tool to convert from one to the other. This tool also requires new library libgfs.a, which is in the gfs branch. 3. gfs2_fsck GFS2 filesystem checker. Enough said. Still needs some work. 4. mkfs.gfs2 GFS2 mkfs program. This will be incorporating udev's "libvolume_id.a" library for determining if a filesystem exists on the device, and what type. In GFS1, we used to do this in a home-grown fashion. Now we're going to start using a standard library. Unfortunately, libvolume_id.a doesn't exist on many systems yet, but that is planned, and we're all set to use it when it's there. In the meantime, we've got it stubbed in with some #ifdefs around. 5. gfs2_edit This is an internal filesystem debugging and editing tool we use here. It can be used to hex-edit the filesystem or print gfs2 data structures. It's a very dangerous tool in the wrong hands, but it has its uses. We've thrown out "gfs_debug" and incorporated the functionality into gfs2_edit. I'm planning to expand its capabilities in the future, to aid in data recovery for badly damaged filesystems that can't be mounted. For example, I'm planning to add the capability to copy files out of an unmounted fs using the tool. We're still working on gfs2_jadd and gfs2_grow, and of course, the GFS2 kernel modules are being incorporated into the upstream kernels. To get the whole cluster suite source code, with the gfs2 directory, use CVS. Do something like this: cvs -d :ext:sources.redhat.com:/cvs/cluster co -r HEAD cluster On the web at: http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/gfs2/?cvsroot=cluster NOTE: This is for the user tools only. The GFS2 kernel source is lying in a public git tree on kernel.org, and should also be considered experimental: git clone rsync://rsync.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6.git gfs2-2.6 (Be forewarned: This is very big and takes a long time). WARNING: These tools are still experimental and I'm sure there are still problems, which we're still working on. So don't trust valuable data to it yet. After all, we are still in development mode. Some might think I'm premature to release these tools before they're rock-solid, but it's also valid that the open source community should get a look at them as soon as it became feasible in the spirit of release early/often. Maybe you can ferret out mistakes, problems or issues I've overlooked. Questions and comments are welcome. Once again, Red Hat puts its money where its mouth is regarding open source and the open source community. Enjoy. Regards, Bob Peterson Red Hat Cluster Suite From rajkum2002 at rediffmail.com Tue Jun 6 19:41:35 2006 From: rajkum2002 at rediffmail.com (Raj Kumar) Date: 6 Jun 2006 19:41:35 -0000 Subject: [Linux-cluster] gfs export over nfs is very slow Message-ID: <20060606194135.30308.qmail@webmail55.rediffmail.com> We are using GFS6.0 and RHEL3 servers. I have been using "async" for all my NFS exports. The speed that I reported is also using "async" option. Any other tips? Thanks, Raj ? On Sat, 03 Jun 2006 Wendy Cheng wrote : >Try these: > >1) Note that 2.6 kernel defaults NFS export option to "sync". So unless >you have a strong need, explicitly set NFS export to "async" in your >export file (/etc/exports in Red Hat systems) and do *not* mount NFS >shares with "sync" option. > >2) Upon large NFS append (i.e. "write" that will increase file size), >using bigger block size (e.g. when using dd command) and/or bigger >application buffer AND increasing NFS wsize and rsize (mount option) to >its maximum, *if* you have to use "sync" in as either export or mount >option. > >The problem here could be GFS sync due to its cluster filesystem nature >and its file sync design. So try to avoid it if possible. > >Let us know how it goes. > >-- Wendy > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wcheng at redhat.com Tue Jun 6 19:45:19 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Tue, 06 Jun 2006 15:45:19 -0400 Subject: [Linux-cluster] gfs export over nfs is very slow In-Reply-To: <20060606194135.30308.qmail@webmail55.rediffmail.com> References: <20060606194135.30308.qmail@webmail55.rediffmail.com> Message-ID: <4485DB4F.9070502@redhat.com> Raj Kumar wrote: > We are using GFS6.0 and RHEL3 servers. I have been using "async" for > all my NFS exports. The speed that I reported is also using "async" > option. Any other tips? > Was the "speed" measured by bonnie++ and only by bonnie++ ? -- Wendy From rajkum2002 at rediffmail.com Tue Jun 6 20:54:35 2006 From: rajkum2002 at rediffmail.com (Raj Kumar) Date: 6 Jun 2006 20:54:35 -0000 Subject: [Linux-cluster] gfs export over nfs is very slow Message-ID: <20060606205435.15785.qmail@webmail46.rediffmail.com> No. I am measuring read speed. I used command "time cat test* > /dev/null" ?to time reading 400 files. All our files are 33MB in size. Our processing applications reads many such files in real time and processes them. The app is falling behind because the read speed is less than 20MB/sec but it got to be able to read at least 2 files per second on average... so we need about 50-60Mb/sec transfer rate on our NFS clients. We could get about 45MB/sec with EXT3 but only about 20MB/sec with GFS. Thanks, Raj On Wed, 07 Jun 2006 Wendy Cheng wrote : >Raj Kumar wrote: > >>We are using GFS6.0 and RHEL3 servers. I have been using "async" for all my NFS exports. The speed that I reported is also using "async" option. Any other tips? >> >Was the "speed" measured by bonnie++ and only by bonnie++ ? > >-- Wendy -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwill at penguincomputing.com Wed Jun 7 00:16:52 2006 From: mwill at penguincomputing.com (Michael Will) Date: Tue, 6 Jun 2006 17:16:52 -0700 Subject: [Linux-cluster] gfs export over nfs is very slow Message-ID: <433093DF7AD7444DA65EFAFE3987879C125DBF@jellyfish.highlyscyld.com> Try xfs which requires centosplus or suse sles9 -----Original Message----- From: Raj Kumar [mailto:rajkum2002 at rediffmail.com] Sent: Tue Jun 06 13:56:26 2006 To: Wendy Cheng Cc: linux clustering Subject: Re: Re: [Linux-cluster] gfs export over nfs is very slow No. I am measuring read speed. I used command "time cat test* > /dev/null" to time reading 400 files. All our files are 33MB in size. Our processing applications reads many such files in real time and processes them. The app is falling behind because the read speed is less than 20MB/sec but it got to be able to read at least 2 files per second on average... so we need about 50-60Mb/sec transfer rate on our NFS clients. We could get about 45MB/sec with EXT3 but only about 20MB/sec with GFS. Thanks, Raj On Wed, 07 Jun 2006 Wendy Cheng wrote : >Raj Kumar wrote: > >>We are using GFS6.0 and RHEL3 servers. I have been using "async" for all my NFS exports. The speed that I reported is also using "async" option. Any other tips? >> >Was the "speed" measured by bonnie++ and only by bonnie++ ? > >-- Wendy -------------- next part -------------- An HTML attachment was scrubbed... URL: From wcheng at redhat.com Wed Jun 7 03:48:19 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Tue, 06 Jun 2006 23:48:19 -0400 Subject: [Linux-cluster] gfs export over nfs is very slow In-Reply-To: <433093DF7AD7444DA65EFAFE3987879C125DBF@jellyfish.highlyscyld.com> References: <433093DF7AD7444DA65EFAFE3987879C125DBF@jellyfish.highlyscyld.com> Message-ID: <1149652100.30034.23.camel@localhost.localdomain> On Tue, 2006-06-06 at 17:16 -0700, Michael Will wrote: > Try xfs which requires centosplus or suse sles9 XFS is not a cluster filesystem - unless you're going to pay for CXFS which is not open source. And be aware that each filesystem has its own strength and weakness. Adding different run time system configurations, when unexpected problems occur, collaborate efforts to trouble-shoot and/or improve the issues are the vital steps to make open source projects work, IMHO. As usual, above are my personal opinions - it is not necessarily my management team's position. -- Wendy From john at turbocorp.com Wed Jun 7 13:53:56 2006 From: john at turbocorp.com (John R. Allgood) Date: Wed, 07 Jun 2006 09:53:56 -0400 Subject: [Linux-cluster] Redhat Cluster Suite and PostgreSQL Message-ID: <4486DA74.5060605@turbocorp.com> Hello I am new to this list and wanting to know if anyone is running Redhat Cluster Suite 3 and PostreSQL. We are converting from a Progress Database to a PostgreSQL Database. I have mulitple postmasters running various databases and each database is defined as a service under the cluster suite. I am running Redhat ES 3.0 Update 7 using Dual Opterons with 8GB RAM. I am using a failover solution so I have two servers primary/secondary and a shared data silo connected via fibre. I have various questions regarding clustering using the above mentioned cluster suite. One of my first questions is does the each service defined under the cluster suite need a seperate service ip. Also I have remote power switches installed on this server does this replace using software watchdog timers or is this used in conjunction with. Thanks John Allgood -- I see the eigenvalue in thine eye, I hear the tender tensor in thy sigh. Bernoulli would have been content to die Had he but known such _a-squared cos 2(phi)! -- Stanislaw Lem, "Cyberiad" From rajkum2002 at rediffmail.com Wed Jun 7 14:56:46 2006 From: rajkum2002 at rediffmail.com (Raj Kumar) Date: 7 Jun 2006 14:56:46 -0000 Subject: [Linux-cluster] Error: Is lock_gulm running Message-ID: <20060607145646.31606.qmail@webmail17.rediffmail.com> Hello, Sometimes a GFS node doesn't shutdown cleanly. A lot of error messages like "Is lock_gulm running. error_code=111" show up on the console. The node doesn't shutdown so I have to power reset it and after starting the node fsck does the filesystem check. It just takes a lot of time. Is it possible to have the node shutdown cleanly in such cases. We use manual method for fencing one of the nodes. There was a network outage between the master lock server and this node. The node status was set to expire and fence_manual didn't succeed so it couldn't join the cluster after restarting it. fence_ack_manual -s nodeip complained there is no /tmp/fifo.tmp file. I had to restart the cluster to get this node join the cluster. Is it possible to join the node without restarting the cluster when it happens again? Thanks, Raj -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwill at penguincomputing.com Wed Jun 7 15:29:59 2006 From: mwill at penguincomputing.com (Michael Will) Date: Wed, 7 Jun 2006 08:29:59 -0700 Subject: [Linux-cluster] gfs export over nfs is very slow Message-ID: <433093DF7AD7444DA65EFAFE3987879C125DC1@jellyfish.highlyscyld.com> The comment was geared towards the ext3+nfs performance question because it is likely that xfs+nfs can meet the required bandwidth. Gfs is a good tool for increasing resiliency but not for increasing throughput (yet). Michael -----Original Message----- From: Wendy Cheng [mailto:wcheng at redhat.com] Sent: Tue Jun 06 20:37:56 2006 To: linux-cluster at redhat.com Subject: Re: Re: [Linux-cluster] gfs export over nfs is very slow On Tue, 2006-06-06 at 17:16 -0700, Michael Will wrote: > Try xfs which requires centosplus or suse sles9 XFS is not a cluster filesystem - unless you're going to pay for CXFS which is not open source. And be aware that each filesystem has its own strength and weakness. Adding different run time system configurations, when unexpected problems occur, collaborate efforts to trouble-shoot and/or improve the issues are the vital steps to make open source projects work, IMHO. As usual, above are my personal opinions - it is not necessarily my management team's position. -- Wendy -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From teigland at redhat.com Wed Jun 7 17:27:40 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 7 Jun 2006 12:27:40 -0500 Subject: [Linux-cluster] [gfs_controld] send messages through separate cpg Message-ID: <20060607172740.GA18684@redhat.com> [new process requires all work to be sent to ml prior to cvs check-in] Set up a separate cpg for sending messages (e.g. for processing mount/unmount) instead of sending them through the cpg used to represent the mount group. Since we apply cpg changes to the mount group async, that cpg won't always contain all the nodes we need to process the mount/unmount. A mount from one node in parallel with unmount from another often won't work without this. diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/Makefile cluster/gfs/lock_dlm/daemon/Makefile --- cluster-HEAD/gfs/lock_dlm/daemon/Makefile 2006-03-27 01:31:46.000000000 -0600 +++ cluster/gfs/lock_dlm/daemon/Makefile 2006-06-06 17:19:40.740421037 -0500 @@ -21,6 +21,7 @@ -I../../include/ \ -I../../../group/lib/ \ -I../../../cman/lib/ \ + -I../../../cman/daemon/openais/trunk/include/ \ -I../../../dlm/lib/ \ -I../../../gfs-kernel/src/dlm/ @@ -33,12 +34,14 @@ gfs_controld: main.o \ member_cman.o \ + cpg.o \ group.o \ plock.o \ recover.o \ withdraw.o \ ../../../dlm/lib/libdlm_lt.a \ ../../../cman/lib/libcman.a \ + ../../../cman/daemon/openais/trunk/lib/libcpg.a \ ../../../group/lib/libgroup.a $(CC) $(LDFLAGS) -o $@ $^ @@ -49,6 +52,9 @@ member_cman.o: member_cman.c $(CC) $(CFLAGS) -c -o $@ $< +cpg.o: cpg.c + $(CC) $(CFLAGS) -c -o $@ $< + recover.o: recover.c $(CC) $(CFLAGS) -c -o $@ $< diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/cpg.c cluster/gfs/lock_dlm/daemon/cpg.c --- cluster-HEAD/gfs/lock_dlm/daemon/cpg.c 1969-12-31 18:00:00.000000000 -0600 +++ cluster/gfs/lock_dlm/daemon/cpg.c 2006-06-07 11:54:28.478585576 -0500 @@ -0,0 +1,212 @@ +/****************************************************************************** +******************************************************************************* +** +** Copyright (C) 2006 Red Hat, Inc. All rights reserved. +** +** This copyrighted material is made available to anyone wishing to use, +** modify, copy, or redistribute it subject to the terms and conditions +** of the GNU General Public License v.2. +** +******************************************************************************* +******************************************************************************/ + +#include "lock_dlm.h" +#include "cpg.h" + +static cpg_handle_t daemon_handle; +static struct cpg_name daemon_name; +static int got_msg; +static int saved_nodeid; +static int saved_len; +static char saved_data[MAX_MSGLEN]; + +void receive_journals(struct mountgroup *mg, char *buf, int len, int from); +void receive_options(struct mountgroup *mg, char *buf, int len, int from); +void receive_remount(struct mountgroup *mg, char *buf, int len, int from); +void receive_plock(struct mountgroup *mg, char *buf, int len, int from); +void receive_recovery_status(struct mountgroup *mg, char *buf, int len, + int from); +void receive_recovery_done(struct mountgroup *mg, char *buf, int len, int from); + + +static void do_deliver(int nodeid, char *data, int len) +{ + struct mountgroup *mg; + struct gdlm_header *hd; + + hd = (struct gdlm_header *) data; + + mg = find_mg(hd->name); + if (!mg) + return; + + hd->version[0] = le16_to_cpu(hd->version[0]); + hd->version[1] = le16_to_cpu(hd->version[1]); + hd->version[2] = le16_to_cpu(hd->version[2]); + hd->type = le16_to_cpu(hd->type); + hd->nodeid = le32_to_cpu(hd->nodeid); + hd->to_nodeid = le32_to_cpu(hd->to_nodeid); + + if (hd->version[0] != GDLM_VER_MAJOR) { + log_error("reject message version %u.%u.%u", + hd->version[0], hd->version[1], hd->version[2]); + return; + } + + /* If there are some group messages between a new node being added to + the cpg group and being added to the app group, the new node should + discard them since they're only relevant to the app group. */ + + if (!mg->last_callback) { + log_group(mg, "discard message type %d len %d from %d", + hd->type, len, nodeid); + return; + } + + switch (hd->type) { + case MSG_JOURNAL: + receive_journals(mg, data, len, nodeid); + break; + + case MSG_OPTIONS: + receive_options(mg, data, len, nodeid); + break; + + case MSG_REMOUNT: + receive_remount(mg, data, len, nodeid); + break; + + case MSG_PLOCK: + receive_plock(mg, data, len, nodeid); + break; + + case MSG_RECOVERY_STATUS: + receive_recovery_status(mg, data, len, nodeid); + break; + + case MSG_RECOVERY_DONE: + receive_recovery_done(mg, data, len, nodeid); + break; + + default: + log_error("unknown message type %d from %d", + hd->type, hd->nodeid); + } +} + +void deliver_cb(cpg_handle_t handle, struct cpg_name *group_name, + uint32_t nodeid, uint32_t pid, void *data, int data_len) +{ + saved_nodeid = nodeid; + saved_len = data_len; + memcpy(saved_data, data, data_len); + got_msg = 1; +} + +void confchg_cb(cpg_handle_t handle, struct cpg_name *group_name, + struct cpg_address *member_list, int member_list_entries, + struct cpg_address *left_list, int left_list_entries, + struct cpg_address *joined_list, int joined_list_entries) +{ +} + +static cpg_callbacks_t callbacks = { + .cpg_deliver_fn = deliver_cb, + .cpg_confchg_fn = confchg_cb, +}; + +int process_cpg(void) +{ + cpg_error_t error; + + got_msg = 0; + saved_len = 0; + saved_nodeid = 0; + memset(saved_data, 0, sizeof(saved_data)); + + error = cpg_dispatch(daemon_handle, CPG_DISPATCH_ONE); + if (error != CPG_OK) { + log_error("cpg_dispatch error %d", error); + return -1; + } + + if (got_msg) + do_deliver(saved_nodeid, saved_data, saved_len); + return 0; +} + +int setup_cpg(void) +{ + cpg_error_t error; + int fd = 0; + + error = cpg_initialize(&daemon_handle, &callbacks); + if (error != CPG_OK) { + log_error("cpg_initialize error %d", error); + return -1; + } + + cpg_fd_get(daemon_handle, &fd); + if (fd < 0) + return -1; + + memset(&daemon_name, 0, sizeof(daemon_name)); + strcpy(daemon_name.value, "gfs_controld"); + daemon_name.length = 12; + + retry: + error = cpg_join(daemon_handle, &daemon_name); + if (error == CPG_ERR_TRY_AGAIN) { + log_debug("setup_cpg cpg_join retry"); + sleep(1); + goto retry; + } + if (error != CPG_OK) { + log_error("cpg_join error %d", error); + cpg_finalize(daemon_handle); + return -1; + } + + log_debug("cpg %d", fd); + return fd; +} + +static int _send_message(cpg_handle_t h, void *buf, int len) +{ + struct iovec iov; + cpg_error_t error; + int retries = 0; + + iov.iov_base = buf; + iov.iov_len = len; + + retry: + error = cpg_mcast_joined(h, CPG_TYPE_AGREED, &iov, 1); + if (error != CPG_OK) + log_error("cpg_mcast_joined error %d handle %llx", error, h); + if (error == CPG_ERR_TRY_AGAIN) { + /* FIXME: backoff say .25 sec, .5 sec, .75 sec, 1 sec */ + retries++; + if (retries > 3) + sleep(1); + goto retry; + } + + return 0; +} + +int send_group_message(struct mountgroup *mg, int len, char *buf) +{ + struct gdlm_header *hd = (struct gdlm_header *) buf; + + hd->version[0] = cpu_to_le16(GDLM_VER_MAJOR); + hd->version[1] = cpu_to_le16(GDLM_VER_MINOR); + hd->version[2] = cpu_to_le16(GDLM_VER_PATCH); + hd->type = cpu_to_le16(hd->type); + hd->nodeid = cpu_to_le32(hd->nodeid); + hd->to_nodeid = cpu_to_le32(hd->to_nodeid); + memcpy(hd->name, mg->name, strlen(mg->name)); + + return _send_message(daemon_handle, buf, len); +} + diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/group.c cluster/gfs/lock_dlm/daemon/group.c --- cluster-HEAD/gfs/lock_dlm/daemon/group.c 2006-06-07 12:10:32.102338261 -0500 +++ cluster/gfs/lock_dlm/daemon/group.c 2006-06-06 17:23:06.523976113 -0500 @@ -21,25 +21,14 @@ static int cb_event_nr; static unsigned int cb_id; static int cb_type; -static int cb_nodeid; -static int cb_len; static int cb_member_count; static int cb_members[MAX_GROUP_MEMBERS]; -static char cb_message[MAX_MSGLEN+1]; int do_stop(struct mountgroup *mg); int do_finish(struct mountgroup *mg); int do_terminate(struct mountgroup *mg); int do_start(struct mountgroup *mg, int type, int count, int *nodeids); -void receive_journals(struct mountgroup *mg, char *buf, int len, int from); -void receive_options(struct mountgroup *mg, char *buf, int len, int from); -void receive_remount(struct mountgroup *mg, char *buf, int len, int from); -void receive_plock(struct mountgroup *mg, char *buf, int len, int from); -void receive_recovery_status(struct mountgroup *mg, char *buf, int len, - int from); -void receive_recovery_done(struct mountgroup *mg, char *buf, int len, int from); - static void stop_cbfn(group_handle_t h, void *private, char *name) { @@ -87,17 +76,9 @@ static void deliver_cbfn(group_handle_t h, void *private, char *name, int nodeid, int len, char *buf) { - int n; - cb_action = DO_DELIVER; - strncpy(cb_name, name, MAX_GROUP_NAME_LEN); - cb_nodeid = nodeid; - cb_len = n = len; - if (len > MAX_MSGLEN) - n = MAX_MSGLEN; - memcpy(&cb_message, buf, n); } -group_callbacks_t callbacks = { +static group_callbacks_t callbacks = { stop_cbfn, start_cbfn, finish_cbfn, @@ -106,53 +87,6 @@ deliver_cbfn }; -static void do_deliver(struct mountgroup *mg) -{ - struct gdlm_header *hd; - - hd = (struct gdlm_header *) cb_message; - - /* If there are some group messages between a new node being added to - the cpg group and being added to the app group, the new node should - discard them since they're only relevant to the app group. */ - - if (!mg->last_callback) { - log_group(mg, "discard message type %d len %d from %d", - hd->type, cb_len, cb_nodeid); - return; - } - - switch (hd->type) { - case MSG_JOURNAL: - receive_journals(mg, cb_message, cb_len, cb_nodeid); - break; - - case MSG_OPTIONS: - receive_options(mg, cb_message, cb_len, cb_nodeid); - break; - - case MSG_REMOUNT: - receive_remount(mg, cb_message, cb_len, cb_nodeid); - break; - - case MSG_PLOCK: - receive_plock(mg, cb_message, cb_len, cb_nodeid); - break; - - case MSG_RECOVERY_STATUS: - receive_recovery_status(mg, cb_message, cb_len, cb_nodeid); - break; - - case MSG_RECOVERY_DONE: - receive_recovery_done(mg, cb_message, cb_len, cb_nodeid); - break; - - default: - log_error("unknown message type %d from %d", - hd->type, hd->nodeid); - } -} - char *str_members(void) { static char buf[MAXLINE]; @@ -222,12 +156,6 @@ mg->id = cb_id; break; - case DO_DELIVER: - log_debug("groupd callback: deliver %s len %d nodeid %d", - cb_name, cb_len, cb_nodeid); - do_deliver(mg); - break; - default: error = -EINVAL; } @@ -257,15 +185,3 @@ return rv; } -int send_group_message(struct mountgroup *mg, int len, char *buf) -{ - int error; - - error = group_send(gh, mg->name, len, buf); - if (error < 0) - log_error("group_send error %d errno %d", error, errno); - else - error = 0; - return error; -} - diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/lock_dlm.h cluster/gfs/lock_dlm/daemon/lock_dlm.h --- cluster-HEAD/gfs/lock_dlm/daemon/lock_dlm.h 2006-05-25 14:30:40.000000000 -0500 +++ cluster/gfs/lock_dlm/daemon/lock_dlm.h 2006-06-06 17:18:25.510916543 -0500 @@ -201,11 +201,16 @@ MSG_RECOVERY_DONE, }; +#define GDLM_VER_MAJOR 1 +#define GDLM_VER_MINOR 0 +#define GDLM_VER_PATCH 0 + struct gdlm_header { uint16_t version[3]; uint16_t type; /* MSG_ */ uint32_t nodeid; /* sender */ uint32_t to_nodeid; /* 0 if to all */ + char name[MAXNAME]; }; @@ -214,6 +219,8 @@ int setup_cman(void); int process_cman(void); +int setup_cpg(void); +int process_cpg(void); int setup_groupd(void); int process_groupd(void); int setup_libdlm(void); diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/main.c cluster/gfs/lock_dlm/daemon/main.c --- cluster-HEAD/gfs/lock_dlm/daemon/main.c 2006-04-21 14:54:10.000000000 -0500 +++ cluster/gfs/lock_dlm/daemon/main.c 2006-06-07 11:59:12.248223925 -0500 @@ -25,6 +25,7 @@ static struct pollfd pollfd[MAX_CLIENTS]; static int cman_fd; +static int cpg_fd; static int listen_fd; static int groupd_fd; static int uevent_fd; @@ -249,6 +250,11 @@ goto out; client_add(cman_fd, &maxi); + rv = cpg_fd = setup_cpg(); + if (rv < 0) + goto out; + client_add(cpg_fd, &maxi); + rv = groupd_fd = setup_groupd(); if (rv < 0) goto out; @@ -272,6 +278,8 @@ goto out; client_add(plocks_fd, &maxi); + log_debug("setup done"); + for (;;) { rv = poll(pollfd, maxi + 1, -1); if (rv < 0) @@ -296,6 +304,8 @@ process_groupd(); else if (pollfd[i].fd == cman_fd) process_cman(); + else if (pollfd[i].fd == cpg_fd) + process_cpg(); else if (pollfd[i].fd == uevent_fd) process_uevent(); else if (!no_withdraw && @@ -310,7 +320,6 @@ if (pollfd[i].revents & POLLHUP) { if (pollfd[i].fd == cman_fd) exit_cman(); - log_debug("closing fd %d", pollfd[i].fd); close(pollfd[i].fd); } } From teigland at redhat.com Thu Jun 8 18:49:42 2006 From: teigland at redhat.com (David Teigland) Date: Thu, 8 Jun 2006 13:49:42 -0500 Subject: [Linux-cluster] post_fail_delay In-Reply-To: <20060606152301.qrmkr58ls8kcs4ss@web.obsidianonline.net> References: <20060606152301.qrmkr58ls8kcs4ss@web.obsidianonline.net> Message-ID: <20060608184942.GA6203@redhat.com> On Tue, Jun 06, 2006 at 03:23:01PM +0200, riaan at obsidian.co.za wrote: > I would like for an errant GFS node to be able to create network/disk > dumps before being power fenced. Am I missing something, or is this > leaving the errand node unfenced for any significant amount of time > (enough to complete the dump, assuming it is upwards of a few seconds) > just a bad idea? No, adding a delay before fencing is just fine, it just prolongs the time until other stuff can be recovered and used normally again. > AFAIUnderstand, the whole idea of fencing is to prevent the node from > damaging the file system in the first place, making the collection of > dumps and power fencing fundamentally at odds with each other. The only way the failed node is going to damage anything is if it happens to write to the fs after its journal has been recovered. That's why the only requirement for fencing is that it happens prior to gfs journal recovery. If a failed node writes to the fs before journal recovery it's no problem. If you want a failed node to disk/net-dump, then set post_fail_delay to some number of seconds just greater than the typical time a dump takes. Dave From tmelhiser at hotmail.com Thu Jun 8 19:31:30 2006 From: tmelhiser at hotmail.com (Travis Melhiser) Date: Thu, 08 Jun 2006 15:31:30 -0400 Subject: [Linux-cluster] Oracle 10GR2 on GFS Message-ID: Is there any way to get 10GR2 to go past the ocrconfig script error: OCRFile is on FS type 18225520. Not supported. -Travis _________________________________________________________________ Don?t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ From vcmarti at sph.emory.edu Thu Jun 8 20:17:09 2006 From: vcmarti at sph.emory.edu (Vernard C. Martin) Date: Thu, 08 Jun 2006 16:17:09 -0400 Subject: [Linux-cluster] Error starting up CLVMD Message-ID: <448885C5.4050505@sph.emory.edu> I recently upgraded from an old version of GFS from May of last year to the latest stable version in the CVS tree. I did this because it wcould compile against the latest kernel in RHEL4U3. Its a two node cluster and had been exhibiing some crashes under heavy I/O load so I thought that the upgrade might help stabilize it. The first node came up fine but the 2nd node is giving me a strange error when trying to start up "clvmd". The error is:[root at node001 ~]# clvmd clvmd could not connect to cluster manager Consult syslog for more information [root at node001 ~]# the syslog has: Jun 8 16:04:16 node001 clvmd: Unable to create lockspace for CLVM: No such file or directory so exactly which file is it talking about so that I can make sure that its there. Any help would be appreciated. From rpeterso at redhat.com Thu Jun 8 21:16:59 2006 From: rpeterso at redhat.com (Robert S Peterson) Date: Thu, 08 Jun 2006 16:16:59 -0500 Subject: [Linux-cluster] Updates to libgfs2 Message-ID: <1149801419.12291.9.camel@technetium.msp.redhat.com> Hi, I just wanted to let you know: I made some bug fixes to libgfs2 for problems with fsck. The following is a patch with the code changes: Also, there were some parts that got missed from the original commit that are there now. Regards, Bob Peterson Red Hat Cluster Suite Index: buf.c =================================================================== RCS file: /cvs/cluster/cluster/gfs2/libgfs2/buf.c,v retrieving revision 1.2 diff -w -u -p -u -p -r1.2 buf.c --- buf.c 6 Jun 2006 14:20:41 -0000 1.2 +++ buf.c 8 Jun 2006 20:58:48 -0000 @@ -188,15 +188,17 @@ void bsync(struct gfs2_sbd *sdp) /* commit buffers to disk but do not discard */ void bcommit(struct gfs2_sbd *sdp) { - osi_list_t *tmp; + osi_list_t *tmp, *x; struct gfs2_buffer_head *bh; - osi_list_foreach(tmp, &sdp->buf_list) { + osi_list_foreach_safe(tmp, &sdp->buf_list, x) { bh = osi_list_entry(tmp, struct gfs2_buffer_head, b_list); - if (bh->b_changed) { + if (!bh->b_count) /* if not reserved for later */ + write_buffer(sdp, bh); /* write the data, free the memory */ + else if (bh->b_changed) { /* if buffer has changed */ do_lseek(sdp, bh->b_blocknr * sdp->bsize); - do_write(sdp, bh->b_data, sdp->bsize); - bh->b_changed = FALSE; + do_write(sdp, bh->b_data, sdp->bsize); /* write it out */ + bh->b_changed = FALSE; /* no longer changed */ } } } Index: fs_ops.c =================================================================== RCS file: /cvs/cluster/cluster/gfs2/libgfs2/fs_ops.c,v retrieving revision 1.2 diff -w -u -p -u -p -r1.2 fs_ops.c --- fs_ops.c 6 Jun 2006 14:20:41 -0000 1.2 +++ fs_ops.c 8 Jun 2006 20:58:49 -0000 @@ -502,14 +502,12 @@ int gfs2_readi(struct gfs2_inode *ip, vo return copied; } -static void -copy_from_mem(struct gfs2_buffer_head *bh, void **buf, +static void copy_from_mem(struct gfs2_buffer_head *bh, void **buf, unsigned int offset, unsigned int size) { char **p = (char **)buf; memcpy(bh->b_data + offset, *p, size); - *p += size; } @@ -526,7 +524,6 @@ int gfs2_writei(struct gfs2_inode *ip, v int isdir = !!(S_ISDIR(ip->i_di.di_flags)); const uint64_t start = offset; int copied = 0; - enum update_flags f; if (!size) return 0; @@ -558,7 +555,6 @@ int gfs2_writei(struct gfs2_inode *ip, v block_map(ip, lblock, &new, &dblock, &extlen); } - f = not_updated; if (new) { bh = bget(sdp, dblock); if (isdir) { @@ -567,12 +563,11 @@ int gfs2_writei(struct gfs2_inode *ip, v mh.mh_type = GFS2_METATYPE_JD; mh.mh_format = GFS2_FORMAT_JD; gfs2_meta_header_out(&mh, bh->b_data); - f = updated; } } else bh = bread(sdp, dblock); copy_from_mem(bh, &buf, o, amount); - brelse(bh, f); + brelse(bh, updated); copied += amount; lblock++; @@ -1084,8 +1079,7 @@ dir_make_exhash(struct gfs2_inode *dip) dip->i_di.di_depth = y; } -static void -dir_l_add(struct gfs2_inode *dip, char *filename, int len, +static void dir_l_add(struct gfs2_inode *dip, char *filename, int len, struct gfs2_inum *inum, unsigned int type) { struct gfs2_dirent *dent; @@ -1564,11 +1558,10 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui struct gfs2_inode *ip; struct gfs2_buffer_head *bh; int x; - uint64_t p, freed_blocks; + uint64_t p; unsigned char *buf; struct rgrp_list *rgd; - freed_blocks = 0; bh = bread(sdp, block); ip = inode_get(sdp, bh); if (ip->i_di.di_height > 0) { @@ -1578,14 +1571,19 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui x += sizeof(uint64_t)) { p = be64_to_cpu(*(uint64_t *)(buf + x)); if (p) { - freed_blocks++; gfs2_set_bitmap(sdp, p, GFS2_BLKST_FREE); + /* We need to adjust the free space count for the freed */ + /* indirect block. */ + rgd = gfs2_blk2rgrpd(sdp, p); /* find the rg for indir block */ + bh = bget(sdp, rgd->ri.ri_addr); /* get the buffer its rg */ + rgd->rg.rg_free++; /* adjust the free count */ + gfs2_rgrp_out(&rgd->rg, bh->b_data); /* back to the buffer */ + brelse(bh, updated); /* release the buffer */ } } } /* Set the bitmap type for inode to free space: */ gfs2_set_bitmap(sdp, ip->i_di.di_num.no_addr, GFS2_BLKST_FREE); - freed_blocks++; /* one for the inode itself */ inode_put(ip, updated); /* Now we have to adjust the rg freespace count and inode count: */ rgd = gfs2_blk2rgrpd(sdp, block); @@ -1593,7 +1591,7 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui /* buffer in memory for the rg on disk because we used it to fix the */ /* bitmaps, some of which are on the same block on disk. */ bh = bread(sdp, rgd->ri.ri_addr); /* get the buffer */ - rgd->rg.rg_free += freed_blocks; + rgd->rg.rg_free++; rgd->rg.rg_dinodes--; /* one less inode in use */ gfs2_rgrp_out(&rgd->rg, bh->b_data); brelse(bh, updated); /* release the buffer */ From rpeterso at redhat.com Thu Jun 8 21:26:41 2006 From: rpeterso at redhat.com (Robert S Peterson) Date: Thu, 08 Jun 2006 16:26:41 -0500 Subject: [Linux-cluster] Patch to gfs2_convert Message-ID: <1149802001.12291.15.camel@technetium.msp.redhat.com> Hi, This patch to gfs2_convert makes it much more forgiving when fs conversions are interrupted in the middle due to power loss, interrupts, or other reasons. Now, if a filesystem conversion is interrupted mid-way through, the tool should be able to pick up where it left off without damage. As always, send questions, comments and concerns to me. If I don't hear from anybody, I'll commit it to cvs in a few days. Regards, Bob Peterson Red Hat Cluster Suite Index: gfs2_convert.c =================================================================== RCS file: /cvs/cluster/cluster/gfs2/convert/gfs2_convert.c,v retrieving revision 1.2 diff -w -u -p -u -p -r1.2 gfs2_convert.c --- gfs2_convert.c 6 Jun 2006 14:37:47 -0000 1.2 +++ gfs2_convert.c 8 Jun 2006 21:13:37 -0000 @@ -77,12 +77,14 @@ void convert_bitmaps(struct gfs2_sbd *sd int x, y; struct gfs2_rindex *ri; unsigned char state; + struct gfs2_buffer_head *bh; ri = &rgd2->ri; gfs2_compute_bitstructs(sdp, rgd2); /* mallocs bh as array */ for (blk = 0; blk < ri->ri_length; blk++) { - rgd2->bh[blk] = bget_generic(sdp, ri->ri_addr + blk, read_disk, - read_disk); + bh = bget_generic(sdp, ri->ri_addr + blk, read_disk, read_disk); + if (!rgd2->bh[blk]) + rgd2->bh[blk] = bh; x = (blk) ? sizeof(struct gfs2_meta_header) : sizeof(struct gfs2_rgrp); for (; x < sdp->bsize; x++) @@ -92,7 +94,6 @@ void convert_bitmaps(struct gfs2_sbd *sd if (state == 0x02) /* unallocated metadata state invalid */ rgd2->bh[blk]->b_data[x] &= ~(0x02 << (GFS2_BIT_SIZE * y)); } - brelse(rgd2->bh[blk], updated); } }/* convert_bitmaps */ @@ -134,10 +135,8 @@ static int superblock_cvt(int disk_fd, c /* convert the ondisk sb structure */ /* --------------------------------- */ sb2->sd_sb.sb_header.mh_magic = GFS2_MAGIC; - sb2->sd_sb.sb_fs_format = GFS2_FORMAT_FS; sb2->sd_sb.sb_header.mh_type = GFS2_METATYPE_SB; sb2->sd_sb.sb_header.mh_format = GFS2_FORMAT_SB; - sb2->sd_sb.sb_multihost_format = GFS2_FORMAT_MULTI; sb2->sd_sb.sb_bsize = sb1->sd_sb.sb_bsize; sb2->sd_sb.sb_bsize_shift = sb1->sd_sb.sb_bsize_shift; strcpy(sb2->sd_sb.sb_lockproto, sb1->sd_sb.sb_lockproto); @@ -174,14 +173,14 @@ static int superblock_cvt(int disk_fd, c rgd2->ri.ri_data0 = rgd->rd_ri.ri_data1; rgd2->ri.ri_data = rgd->rd_ri.ri_data; rgd2->ri.ri_bitbytes = rgd->rd_ri.ri_bitbytes; - /* commit the changes to a gfs2 buffer */ - bh = bread(sb2, rgd2->ri.ri_addr); /* get a gfs2 buffer for the rg */ - gfs2_rgrp_out(&rgd2->rg, bh->b_data); - brelse(bh, updated); /* release the buffer */ /* Add the new gfs2 rg to our list: We'll output the index later. */ osi_list_add_prev((osi_list_t *)&rgd2->list, (osi_list_t *)&sb2->rglist); convert_bitmaps(sb2, rgd2, TRUE); + /* Write the updated rgrp to the gfs2 buffer */ + bh = bget(sb2, rgd2->ri.ri_addr); /* get a gfs2 buffer for the rg */ + gfs2_rgrp_out(&rgd2->rg, rgd2->bh[0]->b_data); + brelse(bh, updated); /* release the buffer */ } return 0; }/* superblock_cvt */ @@ -195,8 +194,12 @@ int adjust_inode(struct gfs2_sbd *sbp, s { struct gfs2_inode *inode; struct inode_block *fixdir; + int inode_was_gfs1; inode = inode_get(sbp, bh); + + inode_was_gfs1 = (inode->i_di.di_num.no_formal_ino == + inode->i_di.di_num.no_addr); /* Fix the inode number: */ inode->i_di.di_num.no_formal_ino = sbp->md.next_inum; ; @@ -240,11 +243,23 @@ int adjust_inode(struct gfs2_sbd *sbp, s /* di_goal_meta has shifted locations and di_goal_data has */ /* changed from 32-bits to 64-bits. The following code */ /* adjusts for the shift. */ + /* */ + /* Note: It may sound absurd, but we need to check if this */ + /* inode has already been converted to gfs2 or if it's */ + /* still a gfs1 inode. That's just in case there was a */ + /* prior attempt to run gfs2_convert that never finished */ + /* (due to power out, ctrl-c, kill, segfault, whatever.) */ + /* If it is unconverted gfs1 we want to do a full */ + /* conversion. If it's a gfs2 inode from a prior run, */ + /* we still need to renumber the inode, but here we */ + /* don't want to shift the data around. */ /* ----------------------------------------------------------- */ + if (inode_was_gfs1) { inode->i_di.di_goal_meta = inode->i_di.di_goal_data; inode->i_di.di_goal_data = 0; /* make sure the upper 32b are 0 */ inode->i_di.di_goal_data = inode->i_di.__pad[0]; inode->i_di.__pad[1] = 0; + } gfs2_dinode_out(&inode->i_di, bh->b_data); sbp->md.next_inum++; /* update inode count */ @@ -344,7 +359,7 @@ int inode_renumber(struct gfs2_sbd *sbp, /* ------------------------------------------------------------------------- */ /* fetch_inum - fetch an inum entry from disk, given its block */ /* ------------------------------------------------------------------------- */ -int fetch_and_fix_inum(struct gfs2_sbd *sbp, uint64_t iblock, +int fetch_inum(struct gfs2_sbd *sbp, uint64_t iblock, struct gfs2_inum *inum) { struct gfs2_buffer_head *bh_fix; @@ -356,7 +371,7 @@ int fetch_and_fix_inum(struct gfs2_sbd * inum->no_addr = fix_inode->i_di.di_num.no_addr; brelse(bh_fix, updated); return 0; -}/* fetch_and_fix_inum */ +}/* fetch_inum */ /* ------------------------------------------------------------------------- */ /* process_dirent_info - fix one dirent (directory entry) buffer */ @@ -382,6 +397,7 @@ int process_dirent_info(struct gfs2_inod /* Turns out you can't trust dir_entries is correct. */ for (de = 0; ; de++) { struct gfs2_inum inum; + int dent_was_gfs1; gettimeofday(&tv, NULL); /* Do more warm fuzzy stuff for the customer. */ @@ -394,18 +410,24 @@ int process_dirent_info(struct gfs2_inod } /* fix the dirent's inode number based on the inode */ gfs2_inum_in(&inum, (char *)&dent->de_inum); + dent_was_gfs1 = (dent->de_inum.no_addr == dent->de_inum.no_formal_ino); if (inum.no_formal_ino) { /* if not a sentinel (placeholder) */ - error = fetch_and_fix_inum(sbp, inum.no_addr, &inum); + error = fetch_inum(sbp, inum.no_addr, &inum); if (error) { printf("Error retrieving inode %" PRIx64 "\n", inum.no_addr); break; } + /* fix the dirent's inode number from the fetched inum. */ + dent->de_inum.no_formal_ino = cpu_to_be64(inum.no_formal_ino); } /* Fix the dirent's filename hash: They are the same as gfs1 */ /* dent->de_hash = cpu_to_be32(gfs2_disk_hash((char *)(dent + 1), */ /* be16_to_cpu(dent->de_name_len))); */ /* Fix the dirent's file type. Gfs1 used home-grown values. */ /* Gfs2 uses standard values from include/linux/fs.h */ + /* Only do this if the dent was a true gfs1 dent, and not a */ + /* gfs2 dent converted from a previously aborted run. */ + if (dent_was_gfs1) { switch be16_to_cpu(dent->de_type) { case GFS_FILE_NON: dent->de_type = cpu_to_be16(DT_UNKNOWN); @@ -432,7 +454,7 @@ int process_dirent_info(struct gfs2_inod dent->de_type = cpu_to_be16(DT_SOCK); break; } - + } error = gfs2_dirent_next(dip, bh, &dent); if (error) break; @@ -948,26 +970,33 @@ int main(int argc, char **argv) inode_put(sb2.md.inum, updated); inode_put(sb2.md.statfs, updated); - bh = bread(&sb2, sb2.sb_addr); - gfs2_sb_out(&sb2.sd_sb, bh->b_data); - brelse(bh, updated); bcommit(&sb2); /* write the buffers to disk */ /* Now delete the now-obsolete gfs1 files: */ printf("Removing obsolete gfs1 structures.\n"); fflush(stdout); - /* Delete the Journal index: */ + /* Delete the old gfs1 Journal index: */ gfs2_freedi(&sb2, sb.sd_sb.sb_jindex_di.no_addr); - /* Delete the rgindex: */ + /* Delete the old gfs1 rgindex: */ gfs2_freedi(&sb2, sb.sd_sb.sb_rindex_di.no_addr); - /* Delete the Quota file: */ + /* Delete the old gfs1 Quota file: */ gfs2_freedi(&sb2, sb.sd_sb.sb_quota_di.no_addr); - /* Delete the License file: */ + /* Delete the old gfs1 License file: */ gfs2_freedi(&sb2, sb.sd_sb.sb_license_di.no_addr); - /* Now free all the rgrps */ + /* Now free all the in memory */ gfs2_rgrp_free(&sb2, updated); printf("Committing changes to disk.\n"); fflush(stdout); + /* Set filesystem type in superblock to gfs2. We do this at the */ + /* end because if the tool is interrupted in the middle, we want */ + /* it to not reject the partially converted fs as already done */ + /* when it's run a second time. */ + bh = bread(&sb2, sb2.sb_addr); + sb2.sd_sb.sb_fs_format = GFS2_FORMAT_FS; + sb2.sd_sb.sb_multihost_format = GFS2_FORMAT_MULTI; + gfs2_sb_out(&sb2.sd_sb, bh->b_data); + brelse(bh, updated); + bsync(&sb2); /* write the buffers to disk */ error = fsync(disk_fd); if (error) From rpeterso at redhat.com Thu Jun 8 22:16:33 2006 From: rpeterso at redhat.com (Robert S Peterson) Date: Thu, 08 Jun 2006 17:16:33 -0500 Subject: [Linux-cluster] Error starting up CLVMD In-Reply-To: <448885C5.4050505@sph.emory.edu> References: <448885C5.4050505@sph.emory.edu> Message-ID: <1149804993.12291.27.camel@technetium.msp.redhat.com> On Thu, 2006-06-08 at 16:17 -0400, Vernard C. Martin wrote: > The first node came up fine but the 2nd node is giving me a strange > error when trying to start up "clvmd". The error is:[root at node001 ~]# clvmd > clvmd could not connect to cluster manager > Consult syslog for more information > [root at node001 ~]# > > the syslog has: > Jun 8 16:04:16 node001 clvmd: Unable to create lockspace for CLVM: No > such file or directory Hi Vernard, I'm not very knowledgeable in the ways of lvm, however, you may want to check to make sure that lock_dlm.ko is loaded (by using lsmod). I don't know the code, but I'm guessing it's trying to create a lock space by opening one of the dlm kernel devices (/dev/dlm*) which should be controlled by the lock_dlm device driver. If that's not loaded, it will fail. Also, make sure the second box can physically see the SAN in /proc/partitions. I've seen some weird things like this happen when a cluster comes up but some of the nodes can't physically access the SAN. I hope this helps. Regards, Bob Peterson Red Hat Cluster Suite From Olivier.Thibault at lmpt.univ-tours.fr Fri Jun 9 12:26:19 2006 From: Olivier.Thibault at lmpt.univ-tours.fr (Olivier Thibault) Date: Fri, 09 Jun 2006 14:26:19 +0200 Subject: [Linux-cluster] gfs_tool gettune Message-ID: <448968EB.6020302@lmpt.univ-tours.fr> Hello, I am testing GFS 6.1 and have a question about the gettune command of gfs_tool. If I do gfs_tool setflag inherit_directio my_directory then gfs_ttol gettune my_directory It displays : new_files_directio = 0 It is the same thing with the inherit_jdata flag and new_files_jdata So my question is : is there any relation between these flags and what gettune displays. Should'nt it display "new_files_directio = 1" ? However, it seems that it impacts on the filesystem as my tests behave differently depending on these flags. So, what are the tuneable options new_files_directio and new_files_jdata ? Is there somewhere any doc about all the tuneable parameters ? Best regards, Olivier -- Olivier THIBAULT Laboratoire de Math?matiques et Physique Th?orique (UMR CNRS 6083) Universit? Fran?ois Rabelais Parc de Grandmont - 37200 TOURS T?l: +33 2 47 36 69 12 Fax: +33 2 47 36 69 56 From pcaulfie at redhat.com Fri Jun 9 13:17:11 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Fri, 09 Jun 2006 14:17:11 +0100 Subject: [Linux-cluster] Error starting up CLVMD In-Reply-To: <1149804993.12291.27.camel@technetium.msp.redhat.com> References: <448885C5.4050505@sph.emory.edu> <1149804993.12291.27.camel@technetium.msp.redhat.com> Message-ID: <448974D7.7050801@redhat.com> Robert S Peterson wrote: > On Thu, 2006-06-08 at 16:17 -0400, Vernard C. Martin wrote: >> The first node came up fine but the 2nd node is giving me a strange >> error when trying to start up "clvmd". The error is:[root at node001 ~]# clvmd >> clvmd could not connect to cluster manager >> Consult syslog for more information >> [root at node001 ~]# >> >> the syslog has: >> Jun 8 16:04:16 node001 clvmd: Unable to create lockspace for CLVM: No >> such file or directory > > Hi Vernard, > > I'm not very knowledgeable in the ways of lvm, however, you may want to > check to make sure that lock_dlm.ko is loaded (by using lsmod). > I don't know the code, but I'm guessing it's trying to create a lock > space by opening one of the dlm kernel devices (/dev/dlm*) which should > be controlled by the lock_dlm device driver. If that's not loaded, it > will fail. Bob's right, it sounds like the DLM isn't loaded. The module name is just "dlm" BTW and the device should show up in /proc/misc and (if udev is running) /dev/misc/dlm-control. lock_dlm is the GFS interface to the DLM...yes, I know it's confusing. -- patrick From rpeterso at redhat.com Fri Jun 9 13:48:18 2006 From: rpeterso at redhat.com (Robert S Peterson) Date: Fri, 09 Jun 2006 08:48:18 -0500 Subject: [Linux-cluster] gfs_tool gettune In-Reply-To: <448968EB.6020302@lmpt.univ-tours.fr> References: <448968EB.6020302@lmpt.univ-tours.fr> Message-ID: <1149860898.3363.2.camel@technetium.msp.redhat.com> On Fri, 2006-06-09 at 14:26 +0200, Olivier Thibault wrote: > Hello, > > I am testing GFS 6.1 and have a question about the gettune command of > gfs_tool. > If I do > gfs_tool setflag inherit_directio my_directory > then > gfs_ttol gettune my_directory > It displays : > new_files_directio = 0 > > It is the same thing with the inherit_jdata flag and new_files_jdata > > So my question is : is there any relation between these flags and what > gettune displays. Should'nt it display "new_files_directio = 1" ? > > However, it seems that it impacts on the filesystem as my tests behave > differently depending on these flags. > So, what are the tuneable options new_files_directio and new_files_jdata ? > Is there somewhere any doc about all the tuneable parameters ? > > Best regards, > > Olivier Hi Olivier, Here's what's going on: inherit_directio and new_files_directio are two separate things. If you look at the man page, inherit_directio operates on a single directory whereas new_files_directio is a filesystem-wide "settune" value. If you do: gfs_tool setflag inherit_directio my_directory You're telling the fs that ONLY your directory and all new files within that directory should have this attribute, which is why your tests are acting as expected, as long as you're within that directory. It basically sets an attribute on an in-memory inode for the directory. If instead you were to do: gfs_tool settune new_files_directio 1 The value new_files_directio value would change for the whole mount point, not just that directory. Of course, you're seeing what gfs_tool gettune my_directory is reporting for the global flag. Regards, Bob Peterson Red Hat Cluster Suite From sdake at redhat.com Wed Jun 7 19:24:25 2006 From: sdake at redhat.com (Steven Dake) Date: Wed, 07 Jun 2006 12:24:25 -0700 Subject: [Linux-cluster] [gfs_controld] send messages through separate cpg In-Reply-To: <20060607172740.GA18684@redhat.com> References: <20060607172740.GA18684@redhat.com> Message-ID: <1149708265.18988.3.camel@shih.broked.org> Dave, I'd say the cpg bits look really good except for the mcast operation (where you have a FIXME). I'd recommend not backing off here, but instead spinning on the transmit if ERR_TRY_AGAIN is returned. Even on a heavily loaded system the delay should not be very significant on a spin operation, unless this code has certain timeouts (not sure about that) that would expire. It would appear not since the code suggests backing off using a timer. Regards -steve On Wed, 2006-06-07 at 12:27 -0500, David Teigland wrote: > [new process requires all work to be sent to ml prior to cvs check-in] > > Set up a separate cpg for sending messages (e.g. for processing > mount/unmount) instead of sending them through the cpg used to represent > the mount group. Since we apply cpg changes to the mount group async, > that cpg won't always contain all the nodes we need to process the > mount/unmount. A mount from one node in parallel with unmount from > another often won't work without this. > > > diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/Makefile cluster/gfs/lock_dlm/daemon/Makefile > --- cluster-HEAD/gfs/lock_dlm/daemon/Makefile 2006-03-27 01:31:46.000000000 -0600 > +++ cluster/gfs/lock_dlm/daemon/Makefile 2006-06-06 17:19:40.740421037 -0500 > @@ -21,6 +21,7 @@ > -I../../include/ \ > -I../../../group/lib/ \ > -I../../../cman/lib/ \ > + -I../../../cman/daemon/openais/trunk/include/ \ > -I../../../dlm/lib/ \ > -I../../../gfs-kernel/src/dlm/ > > @@ -33,12 +34,14 @@ > > gfs_controld: main.o \ > member_cman.o \ > + cpg.o \ > group.o \ > plock.o \ > recover.o \ > withdraw.o \ > ../../../dlm/lib/libdlm_lt.a \ > ../../../cman/lib/libcman.a \ > + ../../../cman/daemon/openais/trunk/lib/libcpg.a \ > ../../../group/lib/libgroup.a > $(CC) $(LDFLAGS) -o $@ $^ > > @@ -49,6 +52,9 @@ > member_cman.o: member_cman.c > $(CC) $(CFLAGS) -c -o $@ $< > > +cpg.o: cpg.c > + $(CC) $(CFLAGS) -c -o $@ $< > + > recover.o: recover.c > $(CC) $(CFLAGS) -c -o $@ $< > > diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/cpg.c cluster/gfs/lock_dlm/daemon/cpg.c > --- cluster-HEAD/gfs/lock_dlm/daemon/cpg.c 1969-12-31 18:00:00.000000000 -0600 > +++ cluster/gfs/lock_dlm/daemon/cpg.c 2006-06-07 11:54:28.478585576 -0500 > @@ -0,0 +1,212 @@ > +/****************************************************************************** > +******************************************************************************* > +** > +** Copyright (C) 2006 Red Hat, Inc. All rights reserved. > +** > +** This copyrighted material is made available to anyone wishing to use, > +** modify, copy, or redistribute it subject to the terms and conditions > +** of the GNU General Public License v.2. > +** > +******************************************************************************* > +******************************************************************************/ > + > +#include "lock_dlm.h" > +#include "cpg.h" > + > +static cpg_handle_t daemon_handle; > +static struct cpg_name daemon_name; > +static int got_msg; > +static int saved_nodeid; > +static int saved_len; > +static char saved_data[MAX_MSGLEN]; > + > +void receive_journals(struct mountgroup *mg, char *buf, int len, int from); > +void receive_options(struct mountgroup *mg, char *buf, int len, int from); > +void receive_remount(struct mountgroup *mg, char *buf, int len, int from); > +void receive_plock(struct mountgroup *mg, char *buf, int len, int from); > +void receive_recovery_status(struct mountgroup *mg, char *buf, int len, > + int from); > +void receive_recovery_done(struct mountgroup *mg, char *buf, int len, int from); > + > + > +static void do_deliver(int nodeid, char *data, int len) > +{ > + struct mountgroup *mg; > + struct gdlm_header *hd; > + > + hd = (struct gdlm_header *) data; > + > + mg = find_mg(hd->name); > + if (!mg) > + return; > + > + hd->version[0] = le16_to_cpu(hd->version[0]); > + hd->version[1] = le16_to_cpu(hd->version[1]); > + hd->version[2] = le16_to_cpu(hd->version[2]); > + hd->type = le16_to_cpu(hd->type); > + hd->nodeid = le32_to_cpu(hd->nodeid); > + hd->to_nodeid = le32_to_cpu(hd->to_nodeid); > + > + if (hd->version[0] != GDLM_VER_MAJOR) { > + log_error("reject message version %u.%u.%u", > + hd->version[0], hd->version[1], hd->version[2]); > + return; > + } > + > + /* If there are some group messages between a new node being added to > + the cpg group and being added to the app group, the new node should > + discard them since they're only relevant to the app group. */ > + > + if (!mg->last_callback) { > + log_group(mg, "discard message type %d len %d from %d", > + hd->type, len, nodeid); > + return; > + } > + > + switch (hd->type) { > + case MSG_JOURNAL: > + receive_journals(mg, data, len, nodeid); > + break; > + > + case MSG_OPTIONS: > + receive_options(mg, data, len, nodeid); > + break; > + > + case MSG_REMOUNT: > + receive_remount(mg, data, len, nodeid); > + break; > + > + case MSG_PLOCK: > + receive_plock(mg, data, len, nodeid); > + break; > + > + case MSG_RECOVERY_STATUS: > + receive_recovery_status(mg, data, len, nodeid); > + break; > + > + case MSG_RECOVERY_DONE: > + receive_recovery_done(mg, data, len, nodeid); > + break; > + > + default: > + log_error("unknown message type %d from %d", > + hd->type, hd->nodeid); > + } > +} > + > +void deliver_cb(cpg_handle_t handle, struct cpg_name *group_name, > + uint32_t nodeid, uint32_t pid, void *data, int data_len) > +{ > + saved_nodeid = nodeid; > + saved_len = data_len; > + memcpy(saved_data, data, data_len); > + got_msg = 1; > +} > + > +void confchg_cb(cpg_handle_t handle, struct cpg_name *group_name, > + struct cpg_address *member_list, int member_list_entries, > + struct cpg_address *left_list, int left_list_entries, > + struct cpg_address *joined_list, int joined_list_entries) > +{ > +} > + > +static cpg_callbacks_t callbacks = { > + .cpg_deliver_fn = deliver_cb, > + .cpg_confchg_fn = confchg_cb, > +}; > + > +int process_cpg(void) > +{ > + cpg_error_t error; > + > + got_msg = 0; > + saved_len = 0; > + saved_nodeid = 0; > + memset(saved_data, 0, sizeof(saved_data)); > + > + error = cpg_dispatch(daemon_handle, CPG_DISPATCH_ONE); > + if (error != CPG_OK) { > + log_error("cpg_dispatch error %d", error); > + return -1; > + } > + > + if (got_msg) > + do_deliver(saved_nodeid, saved_data, saved_len); > + return 0; > +} > + > +int setup_cpg(void) > +{ > + cpg_error_t error; > + int fd = 0; > + > + error = cpg_initialize(&daemon_handle, &callbacks); > + if (error != CPG_OK) { > + log_error("cpg_initialize error %d", error); > + return -1; > + } > + > + cpg_fd_get(daemon_handle, &fd); > + if (fd < 0) > + return -1; > + > + memset(&daemon_name, 0, sizeof(daemon_name)); > + strcpy(daemon_name.value, "gfs_controld"); > + daemon_name.length = 12; > + > + retry: > + error = cpg_join(daemon_handle, &daemon_name); > + if (error == CPG_ERR_TRY_AGAIN) { > + log_debug("setup_cpg cpg_join retry"); > + sleep(1); > + goto retry; > + } > + if (error != CPG_OK) { > + log_error("cpg_join error %d", error); > + cpg_finalize(daemon_handle); > + return -1; > + } > + > + log_debug("cpg %d", fd); > + return fd; > +} > + > +static int _send_message(cpg_handle_t h, void *buf, int len) > +{ > + struct iovec iov; > + cpg_error_t error; > + int retries = 0; > + > + iov.iov_base = buf; > + iov.iov_len = len; > + > + retry: > + error = cpg_mcast_joined(h, CPG_TYPE_AGREED, &iov, 1); > + if (error != CPG_OK) > + log_error("cpg_mcast_joined error %d handle %llx", error, h); > + if (error == CPG_ERR_TRY_AGAIN) { > + /* FIXME: backoff say .25 sec, .5 sec, .75 sec, 1 sec */ > + retries++; > + if (retries > 3) > + sleep(1); > + goto retry; > + } > + > + return 0; > +} > + > +int send_group_message(struct mountgroup *mg, int len, char *buf) > +{ > + struct gdlm_header *hd = (struct gdlm_header *) buf; > + > + hd->version[0] = cpu_to_le16(GDLM_VER_MAJOR); > + hd->version[1] = cpu_to_le16(GDLM_VER_MINOR); > + hd->version[2] = cpu_to_le16(GDLM_VER_PATCH); > + hd->type = cpu_to_le16(hd->type); > + hd->nodeid = cpu_to_le32(hd->nodeid); > + hd->to_nodeid = cpu_to_le32(hd->to_nodeid); > + memcpy(hd->name, mg->name, strlen(mg->name)); > + > + return _send_message(daemon_handle, buf, len); > +} > + > diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/group.c cluster/gfs/lock_dlm/daemon/group.c > --- cluster-HEAD/gfs/lock_dlm/daemon/group.c 2006-06-07 12:10:32.102338261 -0500 > +++ cluster/gfs/lock_dlm/daemon/group.c 2006-06-06 17:23:06.523976113 -0500 > @@ -21,25 +21,14 @@ > static int cb_event_nr; > static unsigned int cb_id; > static int cb_type; > -static int cb_nodeid; > -static int cb_len; > static int cb_member_count; > static int cb_members[MAX_GROUP_MEMBERS]; > -static char cb_message[MAX_MSGLEN+1]; > > int do_stop(struct mountgroup *mg); > int do_finish(struct mountgroup *mg); > int do_terminate(struct mountgroup *mg); > int do_start(struct mountgroup *mg, int type, int count, int *nodeids); > > -void receive_journals(struct mountgroup *mg, char *buf, int len, int from); > -void receive_options(struct mountgroup *mg, char *buf, int len, int from); > -void receive_remount(struct mountgroup *mg, char *buf, int len, int from); > -void receive_plock(struct mountgroup *mg, char *buf, int len, int from); > -void receive_recovery_status(struct mountgroup *mg, char *buf, int len, > - int from); > -void receive_recovery_done(struct mountgroup *mg, char *buf, int len, int from); > - > > static void stop_cbfn(group_handle_t h, void *private, char *name) > { > @@ -87,17 +76,9 @@ > static void deliver_cbfn(group_handle_t h, void *private, char *name, > int nodeid, int len, char *buf) > { > - int n; > - cb_action = DO_DELIVER; > - strncpy(cb_name, name, MAX_GROUP_NAME_LEN); > - cb_nodeid = nodeid; > - cb_len = n = len; > - if (len > MAX_MSGLEN) > - n = MAX_MSGLEN; > - memcpy(&cb_message, buf, n); > } > > -group_callbacks_t callbacks = { > +static group_callbacks_t callbacks = { > stop_cbfn, > start_cbfn, > finish_cbfn, > @@ -106,53 +87,6 @@ > deliver_cbfn > }; > > -static void do_deliver(struct mountgroup *mg) > -{ > - struct gdlm_header *hd; > - > - hd = (struct gdlm_header *) cb_message; > - > - /* If there are some group messages between a new node being added to > - the cpg group and being added to the app group, the new node should > - discard them since they're only relevant to the app group. */ > - > - if (!mg->last_callback) { > - log_group(mg, "discard message type %d len %d from %d", > - hd->type, cb_len, cb_nodeid); > - return; > - } > - > - switch (hd->type) { > - case MSG_JOURNAL: > - receive_journals(mg, cb_message, cb_len, cb_nodeid); > - break; > - > - case MSG_OPTIONS: > - receive_options(mg, cb_message, cb_len, cb_nodeid); > - break; > - > - case MSG_REMOUNT: > - receive_remount(mg, cb_message, cb_len, cb_nodeid); > - break; > - > - case MSG_PLOCK: > - receive_plock(mg, cb_message, cb_len, cb_nodeid); > - break; > - > - case MSG_RECOVERY_STATUS: > - receive_recovery_status(mg, cb_message, cb_len, cb_nodeid); > - break; > - > - case MSG_RECOVERY_DONE: > - receive_recovery_done(mg, cb_message, cb_len, cb_nodeid); > - break; > - > - default: > - log_error("unknown message type %d from %d", > - hd->type, hd->nodeid); > - } > -} > - > char *str_members(void) > { > static char buf[MAXLINE]; > @@ -222,12 +156,6 @@ > mg->id = cb_id; > break; > > - case DO_DELIVER: > - log_debug("groupd callback: deliver %s len %d nodeid %d", > - cb_name, cb_len, cb_nodeid); > - do_deliver(mg); > - break; > - > default: > error = -EINVAL; > } > @@ -257,15 +185,3 @@ > return rv; > } > > -int send_group_message(struct mountgroup *mg, int len, char *buf) > -{ > - int error; > - > - error = group_send(gh, mg->name, len, buf); > - if (error < 0) > - log_error("group_send error %d errno %d", error, errno); > - else > - error = 0; > - return error; > -} > - > diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/lock_dlm.h cluster/gfs/lock_dlm/daemon/lock_dlm.h > --- cluster-HEAD/gfs/lock_dlm/daemon/lock_dlm.h 2006-05-25 14:30:40.000000000 -0500 > +++ cluster/gfs/lock_dlm/daemon/lock_dlm.h 2006-06-06 17:18:25.510916543 -0500 > @@ -201,11 +201,16 @@ > MSG_RECOVERY_DONE, > }; > > +#define GDLM_VER_MAJOR 1 > +#define GDLM_VER_MINOR 0 > +#define GDLM_VER_PATCH 0 > + > struct gdlm_header { > uint16_t version[3]; > uint16_t type; /* MSG_ */ > uint32_t nodeid; /* sender */ > uint32_t to_nodeid; /* 0 if to all */ > + char name[MAXNAME]; > }; > > > @@ -214,6 +219,8 @@ > > int setup_cman(void); > int process_cman(void); > +int setup_cpg(void); > +int process_cpg(void); > int setup_groupd(void); > int process_groupd(void); > int setup_libdlm(void); > diff -urN -X dontdiff cluster-HEAD/gfs/lock_dlm/daemon/main.c cluster/gfs/lock_dlm/daemon/main.c > --- cluster-HEAD/gfs/lock_dlm/daemon/main.c 2006-04-21 14:54:10.000000000 -0500 > +++ cluster/gfs/lock_dlm/daemon/main.c 2006-06-07 11:59:12.248223925 -0500 > @@ -25,6 +25,7 @@ > static struct pollfd pollfd[MAX_CLIENTS]; > > static int cman_fd; > +static int cpg_fd; > static int listen_fd; > static int groupd_fd; > static int uevent_fd; > @@ -249,6 +250,11 @@ > goto out; > client_add(cman_fd, &maxi); > > + rv = cpg_fd = setup_cpg(); > + if (rv < 0) > + goto out; > + client_add(cpg_fd, &maxi); > + > rv = groupd_fd = setup_groupd(); > if (rv < 0) > goto out; > @@ -272,6 +278,8 @@ > goto out; > client_add(plocks_fd, &maxi); > > + log_debug("setup done"); > + > for (;;) { > rv = poll(pollfd, maxi + 1, -1); > if (rv < 0) > @@ -296,6 +304,8 @@ > process_groupd(); > else if (pollfd[i].fd == cman_fd) > process_cman(); > + else if (pollfd[i].fd == cpg_fd) > + process_cpg(); > else if (pollfd[i].fd == uevent_fd) > process_uevent(); > else if (!no_withdraw && > @@ -310,7 +320,6 @@ > if (pollfd[i].revents & POLLHUP) { > if (pollfd[i].fd == cman_fd) > exit_cman(); > - log_debug("closing fd %d", pollfd[i].fd); > close(pollfd[i].fd); > } > } > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From sdake at redhat.com Thu Jun 8 21:10:21 2006 From: sdake at redhat.com (Steven Dake) Date: Thu, 08 Jun 2006 14:10:21 -0700 Subject: [Linux-cluster] Updates to libgfs2 In-Reply-To: <1149801419.12291.9.camel@technetium.msp.redhat.com> References: <1149801419.12291.9.camel@technetium.msp.redhat.com> Message-ID: <1149801021.20886.4.camel@shih.broked.org> Bob, The copy_form_mem function looks as though it may break strict aliasing rules set by the ISO C 99 standard. Have you tried compiling with - Wstrict-aliasing=2 as a CFLAGS option? If you receive no warnings here, you should be ok. Regards -steve On Thu, 2006-06-08 at 16:16 -0500, Robert S Peterson wrote: > Hi, > > I just wanted to let you know: I made some bug fixes to libgfs2 for > problems with fsck. The following is a patch with the code changes: > Also, there were some parts that got missed from the original commit > that are there now. > > Regards, > > Bob Peterson > Red Hat Cluster Suite > > Index: buf.c > =================================================================== > RCS file: /cvs/cluster/cluster/gfs2/libgfs2/buf.c,v > retrieving revision 1.2 > diff -w -u -p -u -p -r1.2 buf.c > --- buf.c 6 Jun 2006 14:20:41 -0000 1.2 > +++ buf.c 8 Jun 2006 20:58:48 -0000 > @@ -188,15 +188,17 @@ void bsync(struct gfs2_sbd *sdp) > /* commit buffers to disk but do not discard */ > void bcommit(struct gfs2_sbd *sdp) > { > - osi_list_t *tmp; > + osi_list_t *tmp, *x; > struct gfs2_buffer_head *bh; > > - osi_list_foreach(tmp, &sdp->buf_list) { > + osi_list_foreach_safe(tmp, &sdp->buf_list, x) { > bh = osi_list_entry(tmp, struct gfs2_buffer_head, b_list); > - if (bh->b_changed) { > + if (!bh->b_count) /* if not reserved for later */ > + write_buffer(sdp, bh); /* write the data, free the memory */ > + else if (bh->b_changed) { /* if buffer has changed */ > do_lseek(sdp, bh->b_blocknr * sdp->bsize); > - do_write(sdp, bh->b_data, sdp->bsize); > - bh->b_changed = FALSE; > + do_write(sdp, bh->b_data, sdp->bsize); /* write it out */ > + bh->b_changed = FALSE; /* no longer changed */ > } > } > } > Index: fs_ops.c > =================================================================== > RCS file: /cvs/cluster/cluster/gfs2/libgfs2/fs_ops.c,v > retrieving revision 1.2 > diff -w -u -p -u -p -r1.2 fs_ops.c > --- fs_ops.c 6 Jun 2006 14:20:41 -0000 1.2 > +++ fs_ops.c 8 Jun 2006 20:58:49 -0000 > @@ -502,14 +502,12 @@ int gfs2_readi(struct gfs2_inode *ip, vo > return copied; > } > > -static void > -copy_from_mem(struct gfs2_buffer_head *bh, void **buf, > +static void copy_from_mem(struct gfs2_buffer_head *bh, void **buf, > unsigned int offset, unsigned int size) > { > char **p = (char **)buf; > > memcpy(bh->b_data + offset, *p, size); > - > *p += size; > } > > @@ -526,7 +524,6 @@ int gfs2_writei(struct gfs2_inode *ip, v > int isdir = !!(S_ISDIR(ip->i_di.di_flags)); > const uint64_t start = offset; > int copied = 0; > - enum update_flags f; > > if (!size) > return 0; > @@ -558,7 +555,6 @@ int gfs2_writei(struct gfs2_inode *ip, v > block_map(ip, lblock, &new, &dblock, &extlen); > } > > - f = not_updated; > if (new) { > bh = bget(sdp, dblock); > if (isdir) { > @@ -567,12 +563,11 @@ int gfs2_writei(struct gfs2_inode *ip, v > mh.mh_type = GFS2_METATYPE_JD; > mh.mh_format = GFS2_FORMAT_JD; > gfs2_meta_header_out(&mh, bh->b_data); > - f = updated; > } > } else > bh = bread(sdp, dblock); > copy_from_mem(bh, &buf, o, amount); > - brelse(bh, f); > + brelse(bh, updated); > > copied += amount; > lblock++; > @@ -1084,8 +1079,7 @@ dir_make_exhash(struct gfs2_inode *dip) > dip->i_di.di_depth = y; > } > > -static void > -dir_l_add(struct gfs2_inode *dip, char *filename, int len, > +static void dir_l_add(struct gfs2_inode *dip, char *filename, int len, > struct gfs2_inum *inum, unsigned int type) > { > struct gfs2_dirent *dent; > @@ -1564,11 +1558,10 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui > struct gfs2_inode *ip; > struct gfs2_buffer_head *bh; > int x; > - uint64_t p, freed_blocks; > + uint64_t p; > unsigned char *buf; > struct rgrp_list *rgd; > > - freed_blocks = 0; > bh = bread(sdp, block); > ip = inode_get(sdp, bh); > if (ip->i_di.di_height > 0) { > @@ -1578,14 +1571,19 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui > x += sizeof(uint64_t)) { > p = be64_to_cpu(*(uint64_t *)(buf + x)); > if (p) { > - freed_blocks++; > gfs2_set_bitmap(sdp, p, GFS2_BLKST_FREE); > + /* We need to adjust the free space count for the freed */ > + /* indirect block. */ > + rgd = gfs2_blk2rgrpd(sdp, p); /* find the rg for indir block */ > + bh = bget(sdp, rgd->ri.ri_addr); /* get the buffer its rg */ > + rgd->rg.rg_free++; /* adjust the free count */ > + gfs2_rgrp_out(&rgd->rg, bh->b_data); /* back to the buffer */ > + brelse(bh, updated); /* release the buffer */ > } > } > } > /* Set the bitmap type for inode to free space: */ > gfs2_set_bitmap(sdp, ip->i_di.di_num.no_addr, GFS2_BLKST_FREE); > - freed_blocks++; /* one for the inode itself */ > inode_put(ip, updated); > /* Now we have to adjust the rg freespace count and inode count: */ > rgd = gfs2_blk2rgrpd(sdp, block); > @@ -1593,7 +1591,7 @@ int gfs2_freedi(struct gfs2_sbd *sdp, ui > /* buffer in memory for the rg on disk because we used it to fix the */ > /* bitmaps, some of which are on the same block on disk. */ > bh = bread(sdp, rgd->ri.ri_addr); /* get the buffer */ > - rgd->rg.rg_free += freed_blocks; > + rgd->rg.rg_free++; > rgd->rg.rg_dinodes--; /* one less inode in use */ > gfs2_rgrp_out(&rgd->rg, bh->b_data); > brelse(bh, updated); /* release the buffer */ > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From sdake at redhat.com Thu Jun 8 21:16:03 2006 From: sdake at redhat.com (Steven Dake) Date: Thu, 08 Jun 2006 14:16:03 -0700 Subject: [Linux-cluster] Patch to gfs2_convert In-Reply-To: <1149802001.12291.15.camel@technetium.msp.redhat.com> References: <1149802001.12291.15.camel@technetium.msp.redhat.com> Message-ID: <1149801363.20886.9.camel@shih.broked.org> Bob, Your patch looks good to me. One issue that could occur which I'm not sure is handled is the failure of the conversion right during the gfs1 to gfs2 inode conversion. Is it possible for half the data structure to be converted then failure to occur causing the inode to be in a half- converted state? I don't know enough about the code to be sure of this case. Regards -steve On Thu, 2006-06-08 at 16:26 -0500, Robert S Peterson wrote: > Hi, > > This patch to gfs2_convert makes it much more forgiving when fs conversions > are interrupted in the middle due to power loss, interrupts, or other > reasons. Now, if a filesystem conversion is interrupted mid-way through, > the tool should be able to pick up where it left off without damage. > > As always, send questions, comments and concerns to me. If I don't hear > from anybody, I'll commit it to cvs in a few days. > > Regards, > > Bob Peterson > Red Hat Cluster Suite > > Index: gfs2_convert.c > =================================================================== > RCS file: /cvs/cluster/cluster/gfs2/convert/gfs2_convert.c,v > retrieving revision 1.2 > diff -w -u -p -u -p -r1.2 gfs2_convert.c > --- gfs2_convert.c 6 Jun 2006 14:37:47 -0000 1.2 > +++ gfs2_convert.c 8 Jun 2006 21:13:37 -0000 > @@ -77,12 +77,14 @@ void convert_bitmaps(struct gfs2_sbd *sd > int x, y; > struct gfs2_rindex *ri; > unsigned char state; > + struct gfs2_buffer_head *bh; > > ri = &rgd2->ri; > gfs2_compute_bitstructs(sdp, rgd2); /* mallocs bh as array */ > for (blk = 0; blk < ri->ri_length; blk++) { > - rgd2->bh[blk] = bget_generic(sdp, ri->ri_addr + blk, read_disk, > - read_disk); > + bh = bget_generic(sdp, ri->ri_addr + blk, read_disk, read_disk); > + if (!rgd2->bh[blk]) > + rgd2->bh[blk] = bh; > x = (blk) ? sizeof(struct gfs2_meta_header) : sizeof(struct gfs2_rgrp); > > for (; x < sdp->bsize; x++) > @@ -92,7 +94,6 @@ void convert_bitmaps(struct gfs2_sbd *sd > if (state == 0x02) /* unallocated metadata state invalid */ > rgd2->bh[blk]->b_data[x] &= ~(0x02 << (GFS2_BIT_SIZE * y)); > } > - brelse(rgd2->bh[blk], updated); > } > }/* convert_bitmaps */ > > @@ -134,10 +135,8 @@ static int superblock_cvt(int disk_fd, c > /* convert the ondisk sb structure */ > /* --------------------------------- */ > sb2->sd_sb.sb_header.mh_magic = GFS2_MAGIC; > - sb2->sd_sb.sb_fs_format = GFS2_FORMAT_FS; > sb2->sd_sb.sb_header.mh_type = GFS2_METATYPE_SB; > sb2->sd_sb.sb_header.mh_format = GFS2_FORMAT_SB; > - sb2->sd_sb.sb_multihost_format = GFS2_FORMAT_MULTI; > sb2->sd_sb.sb_bsize = sb1->sd_sb.sb_bsize; > sb2->sd_sb.sb_bsize_shift = sb1->sd_sb.sb_bsize_shift; > strcpy(sb2->sd_sb.sb_lockproto, sb1->sd_sb.sb_lockproto); > @@ -174,14 +173,14 @@ static int superblock_cvt(int disk_fd, c > rgd2->ri.ri_data0 = rgd->rd_ri.ri_data1; > rgd2->ri.ri_data = rgd->rd_ri.ri_data; > rgd2->ri.ri_bitbytes = rgd->rd_ri.ri_bitbytes; > - /* commit the changes to a gfs2 buffer */ > - bh = bread(sb2, rgd2->ri.ri_addr); /* get a gfs2 buffer for the rg */ > - gfs2_rgrp_out(&rgd2->rg, bh->b_data); > - brelse(bh, updated); /* release the buffer */ > /* Add the new gfs2 rg to our list: We'll output the index later. */ > osi_list_add_prev((osi_list_t *)&rgd2->list, > (osi_list_t *)&sb2->rglist); > convert_bitmaps(sb2, rgd2, TRUE); > + /* Write the updated rgrp to the gfs2 buffer */ > + bh = bget(sb2, rgd2->ri.ri_addr); /* get a gfs2 buffer for the rg */ > + gfs2_rgrp_out(&rgd2->rg, rgd2->bh[0]->b_data); > + brelse(bh, updated); /* release the buffer */ > } > return 0; > }/* superblock_cvt */ > @@ -195,8 +194,12 @@ int adjust_inode(struct gfs2_sbd *sbp, s > { > struct gfs2_inode *inode; > struct inode_block *fixdir; > + int inode_was_gfs1; > > inode = inode_get(sbp, bh); > + > + inode_was_gfs1 = (inode->i_di.di_num.no_formal_ino == > + inode->i_di.di_num.no_addr); > /* Fix the inode number: */ > inode->i_di.di_num.no_formal_ino = sbp->md.next_inum; ; > > @@ -240,11 +243,23 @@ int adjust_inode(struct gfs2_sbd *sbp, s > /* di_goal_meta has shifted locations and di_goal_data has */ > /* changed from 32-bits to 64-bits. The following code */ > /* adjusts for the shift. */ > + /* */ > + /* Note: It may sound absurd, but we need to check if this */ > + /* inode has already been converted to gfs2 or if it's */ > + /* still a gfs1 inode. That's just in case there was a */ > + /* prior attempt to run gfs2_convert that never finished */ > + /* (due to power out, ctrl-c, kill, segfault, whatever.) */ > + /* If it is unconverted gfs1 we want to do a full */ > + /* conversion. If it's a gfs2 inode from a prior run, */ > + /* we still need to renumber the inode, but here we */ > + /* don't want to shift the data around. */ > /* ----------------------------------------------------------- */ > + if (inode_was_gfs1) { > inode->i_di.di_goal_meta = inode->i_di.di_goal_data; > inode->i_di.di_goal_data = 0; /* make sure the upper 32b are 0 */ > inode->i_di.di_goal_data = inode->i_di.__pad[0]; > inode->i_di.__pad[1] = 0; > + } > > gfs2_dinode_out(&inode->i_di, bh->b_data); > sbp->md.next_inum++; /* update inode count */ > @@ -344,7 +359,7 @@ int inode_renumber(struct gfs2_sbd *sbp, > /* ------------------------------------------------------------------------- */ > /* fetch_inum - fetch an inum entry from disk, given its block */ > /* ------------------------------------------------------------------------- */ > -int fetch_and_fix_inum(struct gfs2_sbd *sbp, uint64_t iblock, > +int fetch_inum(struct gfs2_sbd *sbp, uint64_t iblock, > struct gfs2_inum *inum) > { > struct gfs2_buffer_head *bh_fix; > @@ -356,7 +371,7 @@ int fetch_and_fix_inum(struct gfs2_sbd * > inum->no_addr = fix_inode->i_di.di_num.no_addr; > brelse(bh_fix, updated); > return 0; > -}/* fetch_and_fix_inum */ > +}/* fetch_inum */ > > /* ------------------------------------------------------------------------- */ > /* process_dirent_info - fix one dirent (directory entry) buffer */ > @@ -382,6 +397,7 @@ int process_dirent_info(struct gfs2_inod > /* Turns out you can't trust dir_entries is correct. */ > for (de = 0; ; de++) { > struct gfs2_inum inum; > + int dent_was_gfs1; > > gettimeofday(&tv, NULL); > /* Do more warm fuzzy stuff for the customer. */ > @@ -394,18 +410,24 @@ int process_dirent_info(struct gfs2_inod > } > /* fix the dirent's inode number based on the inode */ > gfs2_inum_in(&inum, (char *)&dent->de_inum); > + dent_was_gfs1 = (dent->de_inum.no_addr == dent->de_inum.no_formal_ino); > if (inum.no_formal_ino) { /* if not a sentinel (placeholder) */ > - error = fetch_and_fix_inum(sbp, inum.no_addr, &inum); > + error = fetch_inum(sbp, inum.no_addr, &inum); > if (error) { > printf("Error retrieving inode %" PRIx64 "\n", inum.no_addr); > break; > } > + /* fix the dirent's inode number from the fetched inum. */ > + dent->de_inum.no_formal_ino = cpu_to_be64(inum.no_formal_ino); > } > /* Fix the dirent's filename hash: They are the same as gfs1 */ > /* dent->de_hash = cpu_to_be32(gfs2_disk_hash((char *)(dent + 1), */ > /* be16_to_cpu(dent->de_name_len))); */ > /* Fix the dirent's file type. Gfs1 used home-grown values. */ > /* Gfs2 uses standard values from include/linux/fs.h */ > + /* Only do this if the dent was a true gfs1 dent, and not a */ > + /* gfs2 dent converted from a previously aborted run. */ > + if (dent_was_gfs1) { > switch be16_to_cpu(dent->de_type) { > case GFS_FILE_NON: > dent->de_type = cpu_to_be16(DT_UNKNOWN); > @@ -432,7 +454,7 @@ int process_dirent_info(struct gfs2_inod > dent->de_type = cpu_to_be16(DT_SOCK); > break; > } > - > + } > error = gfs2_dirent_next(dip, bh, &dent); > if (error) > break; > @@ -948,26 +970,33 @@ int main(int argc, char **argv) > inode_put(sb2.md.inum, updated); > inode_put(sb2.md.statfs, updated); > > - bh = bread(&sb2, sb2.sb_addr); > - gfs2_sb_out(&sb2.sd_sb, bh->b_data); > - brelse(bh, updated); > bcommit(&sb2); /* write the buffers to disk */ > > /* Now delete the now-obsolete gfs1 files: */ > printf("Removing obsolete gfs1 structures.\n"); > fflush(stdout); > - /* Delete the Journal index: */ > + /* Delete the old gfs1 Journal index: */ > gfs2_freedi(&sb2, sb.sd_sb.sb_jindex_di.no_addr); > - /* Delete the rgindex: */ > + /* Delete the old gfs1 rgindex: */ > gfs2_freedi(&sb2, sb.sd_sb.sb_rindex_di.no_addr); > - /* Delete the Quota file: */ > + /* Delete the old gfs1 Quota file: */ > gfs2_freedi(&sb2, sb.sd_sb.sb_quota_di.no_addr); > - /* Delete the License file: */ > + /* Delete the old gfs1 License file: */ > gfs2_freedi(&sb2, sb.sd_sb.sb_license_di.no_addr); > - /* Now free all the rgrps */ > + /* Now free all the in memory */ > gfs2_rgrp_free(&sb2, updated); > printf("Committing changes to disk.\n"); > fflush(stdout); > + /* Set filesystem type in superblock to gfs2. We do this at the */ > + /* end because if the tool is interrupted in the middle, we want */ > + /* it to not reject the partially converted fs as already done */ > + /* when it's run a second time. */ > + bh = bread(&sb2, sb2.sb_addr); > + sb2.sd_sb.sb_fs_format = GFS2_FORMAT_FS; > + sb2.sd_sb.sb_multihost_format = GFS2_FORMAT_MULTI; > + gfs2_sb_out(&sb2.sd_sb, bh->b_data); > + brelse(bh, updated); > + > bsync(&sb2); /* write the buffers to disk */ > error = fsync(disk_fd); > if (error) > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From ocrete at max-t.com Fri Jun 9 21:40:23 2006 From: ocrete at max-t.com (Olivier =?ISO-8859-1?Q?Cr=EAte?=) Date: Fri, 09 Jun 2006 17:40:23 -0400 Subject: [Linux-cluster] Kernel panic - not syncing: membership stopped responding Message-ID: <1149889224.7865.99.camel@cocagne.max-t.internal> Hi, We rebooted one machine is our cluster (which uses cman) and we got the following message on the changelog of all of the other machines of the cluster (and they panicked!)... We are using a snapshot of the STABLE branch from May 9, 2006. It seems strange to panic the kernel for a ENOMEM... from syslog: Error queueing request to port 1: -12 kernel: Kernel panic - not syncing: membership stopped responding -- Olivier Cr?te ocrete at max-t.com Maximum Throughput Inc. From wcheng at redhat.com Mon Jun 12 05:25:43 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 12 Jun 2006 01:25:43 -0400 Subject: [Linux-cluster] [RFC] NLM lock failover admin interface Message-ID: <1150089943.26019.18.camel@localhost.localdomain> NFS v2/v3 active-active NLM lock failover has been an issue with our cluster suite. With current implementation, it (cluster suite) is trying to carry the workaround as much as it can with user mode scripts where, upon failover, on taken-over server, it: 1. Tear down virtual IP. 2. Unexport the subject NFS export. 3. Signal lockd to drop the locks. 4. Un-mount filesystem if needed. There are many other issues (such as /var/lib/nfs/statd/sm file, etc) but this particular post is to further refine step 3 to avoid the 50 second global (default) grace period for all NFS exports; i.e., we would like to be able to selectively drop locks (only) associated with the requested exports without disrupting other NFS services. We've done some prototype (coding) works but would like to search for community consensus on the admin interface if possible. We've tried out the following: 1. /proc interface, say writing the fsid into a /proc directory entry would end up dropping all NLM locks associated with the NFS export that has fsid in its /etc/exports file. 2. Adding a new flag into "exportfs" command, say "h", such that "exportfs -uh *:/export_path" would un-export the entry and drop the NLM locks associated with the entry. 3. Add a new nfsctl by re-using a 2.4 kernel flag (NFSCTL_FOLOCKS) where it takes: struct nfsctl_folocks { int type; unsigned int fsid; unsigned int devno; } as input argument. Depending on "type", the kernel call would drop the locks associated with either the fsid, or devno. The core of the implementation is a new cloned version of nlm_traverse_files() where it searches the "nlm_files" list one by one to compare the fsid (or devno) based on nlm_file.f_handle field. A helper function is also implemented to extract the fsid (or devno) from f_handle. The new function is planned to allow failover to abort if the file can't be closed. We may also put the file locks back if abort occurs. Would appreciate comments on the above admin interface. As soon as the external interface can be finalized, the code will be submitted for review. -- Wendy From wcheng at redhat.com Mon Jun 12 06:11:04 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 12 Jun 2006 02:11:04 -0400 Subject: [Linux-cluster] [RFC] NLM lock failover admin interface In-Reply-To: <1150089943.26019.18.camel@localhost.localdomain> References: <1150089943.26019.18.camel@localhost.localdomain> Message-ID: <1150092664.26180.19.camel@localhost.localdomain> On Mon, 2006-06-12 at 01:25 -0400, Wendy Cheng wrote: > NFS v2/v3 active-active NLM lock failover has been an issue with our > cluster suite. With current implementation, it (cluster suite) is trying > to carry the workaround as much as it can with user mode scripts where, > upon failover, on taken-over server, it: > > 1. Tear down virtual IP. > 2. Unexport the subject NFS export. > 3. Signal lockd to drop the locks. > 4. Un-mount filesystem if needed. > > There are many other issues (such as /var/lib/nfs/statd/sm file, etc) > but this particular post is to further refine step 3 to avoid the 50 > second global (default) grace period for all NFS exports; i.e., we would > like to be able to selectively drop locks (only) associated with the > requested exports without disrupting other NFS services. > > We've done some prototype (coding) works but would like to search for > community consensus on the admin interface if possible. While ping-pong the emails with our base kernel folks to choose between /proc, or exportfs, or nfsctl (internally within the company - mostly with steved and staubach), Peter suggested to try out multiple lockd(s) to handle different NFS exports. In that case, we may require to change a big portion of lockd kernel code. I prefer not going that far since lockd failover is our cluster suite's immediate issue. However, if this approach can get everyone's vote, we'll comply. -- Wendy From rpeterso at redhat.com Mon Jun 12 14:51:04 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Mon, 12 Jun 2006 09:51:04 -0500 Subject: [Linux-cluster] Updates to libgfs2 In-Reply-To: <1149801021.20886.4.camel@shih.broked.org> References: <1149801419.12291.9.camel@technetium.msp.redhat.com> <1149801021.20886.4.camel@shih.broked.org> Message-ID: <448D7F58.2010603@redhat.com> Steven Dake wrote: > Bob, > > The copy_form_mem function looks as though it may break strict aliasing > rules set by the ISO C 99 standard. Have you tried compiling with - > Wstrict-aliasing=2 as a CFLAGS option? If you receive no warnings here, > you should be ok. > > Regards > -steve > Hi Steve, Thanks for the input. It compiles without warning with: -Wstrict-aliasing=2 Regards, Bob Peterson Red Hat Cluster Suite From Jon.Stanley at savvis.net Mon Jun 12 14:45:03 2006 From: Jon.Stanley at savvis.net (Stanley, Jon) Date: Mon, 12 Jun 2006 09:45:03 -0500 Subject: [Linux-cluster] [RFC] NLM lock failover admin interface Message-ID: <9A6FE0FCC2B29846824C5CD81C6647B902207776@s228130hz1ew08.apptix-01.savvis.net> > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Wendy Cheng > Sent: Monday, June 12, 2006 12:26 AM > To: nfs at lists.sourceforge.net > Cc: linux-cluster at redhat.com > Subject: [Linux-cluster] [RFC] NLM lock failover admin interface > NOTE - I don't use NFS functionality in Cluster Suite, so my coments may be entirely meaningless. > > 1. /proc interface, say writing the fsid into a /proc directory entry > would end up dropping all NLM locks associated with the NFS > export that > has fsid in its /etc/exports file. This would defintely have it's advantages for people who know what they're doing - they could drop all locks without unexporting the filesystem. However, it also gives people the opportunity to shoot themselves in the foot - by eliminating locks that are needed. After weighing the pros and cons, I really don't think that any method accessible via /proc is a good idea. > > 2. Adding a new flag into "exportfs" command, say "h", such that > > "exportfs -uh *:/export_path" > > would un-export the entry and drop the NLM locks associated with the > entry. > This is the best of the three, IMHO. Gives you the safety of *knowing* that the filesystem was unexported before dropping the locks, and preventing folks from shooting themselves in the foot. The other option that was mentioned, a separate lockd for each fs, is also a good idea - but would require a lot of coding no doubt, and introduce more instability into what I already preceive as an unstable NFS subsystem in Linux (I *refuse* to use Linux as an NFS server and instead go with Solaris - I've had *really* bad experiences with Linux NFS under load - but that's getting OT). From rpeterso at redhat.com Mon Jun 12 14:56:27 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Mon, 12 Jun 2006 09:56:27 -0500 Subject: [Linux-cluster] Patch to gfs2_convert In-Reply-To: <1149801363.20886.9.camel@shih.broked.org> References: <1149802001.12291.15.camel@technetium.msp.redhat.com> <1149801363.20886.9.camel@shih.broked.org> Message-ID: <448D809B.7000906@redhat.com> Steven Dake wrote: > Bob, > > Your patch looks good to me. One issue that could occur which I'm not > sure is handled is the failure of the conversion right during the gfs1 > to gfs2 inode conversion. Is it possible for half the data structure to > be converted then failure to occur causing the inode to be in a half- > converted state? I don't know enough about the code to be sure of this > case. > > Regards > -steve Hi Steve, Again, thanks for your input. With the convert tool, the inodes are converted into buffers and the buffers are eventually written out to disk, so either an inode is fully converted or not at all. The latest version of the gfs2_convert tool determines on a per-inode basis whether it has been converted or not, and converts it as appropriate. That way, conversions that are interrupted may be safely resumed later. Regards, Bob Peterson Red Hat Cluster Suite From aneesh.kumar at gmail.com Mon Jun 12 14:51:57 2006 From: aneesh.kumar at gmail.com (Aneesh Kumar) Date: Mon, 12 Jun 2006 20:21:57 +0530 Subject: [Linux-cluster] [RFC] Transport independent Cluster service Message-ID: Hi, I am right now working on ci-linux.sf.net project. The goal is to get the code ready so that projects like GFS and OCFS2 can start using the framework. CI/ICS allows to build cluster service without worrying about the transport mechanism used. With the OpenSSI project we had both IP and infiniband transport and i belive it should be easy to implement one using sctp or TIPC. You can see the result of my work here http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=summary Different components of CI is explained here http://ci-linux.sourceforge.net/components.shtml I have dropped CLMS and CLMS key service. I am looking at using CMAN/configfs for doing the cluster membership part. Registering new cluster service is explained here. http://ci-linux.sourceforge.net/enhancing.shtml here also you can drop the CLMS part . This link explains how to write new cluster service. http://ci-linux.sourceforge.net/ics.shtml A simple example is below http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=blob;h=8282ad15da09901f4cd4bdd62490766458d1cebf;hb=f7e456933b9868486c014a83e473f647149a71f6;f=include/cluster/gen/icssig.svc http://git.openssi.org/~kvaneesh/gitweb.cgi?p=ci-to-linus.git;a=blob;h=1004328006021f5287bb543abc7d15e51964eae6;hb=688587ead9ce2d26070ca051323769cd76c91185;f=include/cluster/gen/icstest.svc Please follow up at sic-linux-devel at lists.sourceforge.net -aneesh From bfields at fieldses.org Mon Jun 12 15:00:53 2006 From: bfields at fieldses.org (J. Bruce Fields) Date: Mon, 12 Jun 2006 11:00:53 -0400 Subject: [Linux-cluster] [RFC] NLM lock failover admin interface In-Reply-To: <1150089943.26019.18.camel@localhost.localdomain> References: <1150089943.26019.18.camel@localhost.localdomain> Message-ID: <20060612150053.GC31596@fieldses.org> On Mon, Jun 12, 2006 at 01:25:43AM -0400, Wendy Cheng wrote: > 2. Adding a new flag into "exportfs" command, say "h", such that > > "exportfs -uh *:/export_path" > > would un-export the entry and drop the NLM locks associated with the > entry. What does the kernel interface end up looking like in that case? --b. From wcheng at redhat.com Mon Jun 12 15:44:55 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 12 Jun 2006 11:44:55 -0400 Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin interface In-Reply-To: <20060612150053.GC31596@fieldses.org> References: <1150089943.26019.18.camel@localhost.localdomain> <20060612150053.GC31596@fieldses.org> Message-ID: <448D8BF7.7010105@redhat.com> J. Bruce Fields wrote: >On Mon, Jun 12, 2006 at 01:25:43AM -0400, Wendy Cheng wrote: > > >>2. Adding a new flag into "exportfs" command, say "h", such that >> >> "exportfs -uh *:/export_path" >> >>would un-export the entry and drop the NLM locks associated with the >>entry. >> >> > >What does the kernel interface end up looking like in that case? > > > Happy to see this new exportfs command gets positive response - it was our original pick too. Uploaded is part of a draft version of 2.4 base kernel patch - we're cleaning up 2.6 patches at this moment. It basically adds a new export flag (NFSEXP_FOLOCK - note that ex_flags is an int but is currently only defined up to 16 bits) so nfs-util and kernel can communicate. The nice thing about this approach is the recovery part - the take-over server can use the counter part command to export and set grace period for one particular interface within the same system call. -- Wendy -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: gfs_nlm.patch URL: From m_pupil at yahoo.com.cn Mon Jun 12 15:58:50 2006 From: m_pupil at yahoo.com.cn (FKtPp) Date: Mon, 12 Jun 2006 23:58:50 +0800 (CST) Subject: [Linux-cluster] GFS 6.0 lock problem? Message-ID: <20060612155850.87286.qmail@web15009.mail.cnb.yahoo.com> hi list, I recentlt setup a two server GFS cluster. Both of them export a 8G partition useing gnbd_serv and import eaother with gnbd_import. I assambled these two gnbd block to a ~17G pool. After that, I run ccs_servd at one server ( hostname: gfs1 ) to serve the ccs archive, and started the ccsd, lock_gulmd. Then I run gfs_mkfs at gfs1, and mounted the newly created fs on both servers. Everything seems ok, before I write something to the fs mounted at the other server ( hostname: gfs2 ). When I mkdir at machine gfs2, machine gfs1 can't see that directory; but when I mkdir at machine gfs1, gfs2 can see that new directory . I tried dd a 1GB file to the filesystem at machine gfs2, but at machine gfs1, I still see nothing. How can I figure out what was wrong? The /var/log/message didn't provide enough information... __________________________________________________ ??????????????? http://cn.mail.yahoo.com From aneesh.kumar at gmail.com Mon Jun 12 17:21:37 2006 From: aneesh.kumar at gmail.com (Aneesh Kumar) Date: Mon, 12 Jun 2006 22:51:37 +0530 Subject: [Linux-cluster] Re: [RFC] Transport independent Cluster service In-Reply-To: References: Message-ID: To further explain the simplicity of writing a cluster service using this framework you can look at the below attached icstest service. with this calling ics_test_print(char *string) will cause the string to be printed on node 3. -aneesh -------------- next part -------------- A non-text attachment was scrubbed... Name: icstest.svc Type: application/octet-stream Size: 1175 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: icstest_cli.c Type: text/x-csrc Size: 652 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: icstest_ics.c Type: text/x-csrc Size: 1651 bytes Desc: not available URL: From wcheng at redhat.com Mon Jun 12 18:09:30 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 12 Jun 2006 14:09:30 -0400 Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin interface In-Reply-To: <448DE1C1.C935.0084.1@novell.com> References: <1150089943.26019.18.camel@localhost.localdomain> <20060612150053.GC31596@fieldses.org> <448D8BF7.7010105@redhat.com> <448DE1C1.C935.0084.1@novell.com> Message-ID: <448DADDA.2070209@redhat.com> Madhan P wrote: >For what it's worth, would second this approach of using a flag to >unexport and associating the cleanup with that. > Happy to have another vote :) ! It is appreicated. > Another quick hack we >used was to store the NSM entries on a standard location on the >respective exported filesystem, so that notification is sent once the >filesystem comes back online on the destination server and is exported >again. BTW, this was not on Linux. It was a simple solution providing >the necessary active/active and active/passive cluster support. > > Lon Hohberge (from our cluster suite team) has been working on similar setup too (to structure the MSM file directory). We'll submit the associated kernel patch when it is ready ("rpc.statd -H" needs some bandaids). Future reviews and comments are also appreciated. -- Wendy From m_pupil at 163.com Mon Jun 12 15:57:44 2006 From: m_pupil at 163.com (Kai) Date: Mon, 12 Jun 2006 23:57:44 +0800 Subject: [Linux-cluster] GFS 6.0 lock problem? Message-ID: <448D8EF8.8030602@163.com> hi list, I recentlt setup a two server GFS cluster. Both of them export a 8G partition useing gnbd_serv and import eaother with gnbd_import. I assambled these two gnbd block to a ~17G pool. After that, I run ccs_servd at one server ( hostname: gfs1 ) to serve the ccs archive, and started the ccsd, lock_gulmd. Then I run gfs_mkfs at gfs1, and mounted the newly created fs on both servers. Everything seems ok, before I write something to the fs mounted at the other server ( hostname: gfs2 ). When I mkdir at machine gfs2, machine gfs1 can't see that directory; but when I mkdir at machine gfs1, gfs2 can see that new directory . I tried dd a 1GB file to the filesystem at machine gfs2, but at machine gfs1, I still see nothing. How can I figure out what was wrong? The /var/log/message didn't provide enough information... From SteveD at redhat.com Mon Jun 12 17:23:02 2006 From: SteveD at redhat.com (Steve Dickson) Date: Mon, 12 Jun 2006 13:23:02 -0400 Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin interface In-Reply-To: <448D8BF7.7010105@redhat.com> References: <1150089943.26019.18.camel@localhost.localdomain> <20060612150053.GC31596@fieldses.org> <448D8BF7.7010105@redhat.com> Message-ID: <448DA2F6.8080605@RedHat.com> Wendy Cheng wrote: > The nice thing about this approach is the recovery part - the take-over > server can use the counter part command to export and set grace period > for one particular interface within the same system call. Actually this is a pretty clean and simple interface... imho.. The only issue I had was adding a flag to an older version and then having to carry that flag forward... So if this interface is accepted and added to the mainline nfs-utils (which it should be.. imho) that fact it is so clean and simple would make the back porting fairly trivial... steved. From jmy at sgi.com Mon Jun 12 17:27:12 2006 From: jmy at sgi.com (James Yarbrough) Date: Mon, 12 Jun 2006 10:27:12 -0700 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface References: <1150089943.26019.18.camel@localhost.localdomain> Message-ID: <448DA3F0.AF1C8540@sgi.com> > 2. Adding a new flag into "exportfs" command, say "h", such that > > "exportfs -uh *:/export_path" > > would un-export the entry and drop the NLM locks associated with the > entry. This is fine for releasing the locks, but how do you plan to re-enter the grace period for reclaiming the locks when you relocate the export? And how do you intend to segregate the export for which reclaims are valid from the ones which are not? How do you plan to support the sending of SM_NOTIFY? This might be where a lockd per export has an advantage. -- jmy at sgi.com 650 933 3124 Why is there a snake in my Coke? From PMadhan at novell.com Mon Jun 12 16:20:57 2006 From: PMadhan at novell.com (Madhan P) Date: Mon, 12 Jun 2006 10:20:57 -0600 Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin interface In-Reply-To: <448D8BF7.7010105@redhat.com> References: <1150089943.26019.18.camel@localhost.localdomain> <20060612150053.GC31596@fieldses.org> <448D8BF7.7010105@redhat.com> Message-ID: <448DE1C1.C935.0084.1@novell.com> For what it's worth, would second this approach of using a flag to unexport and associating the cleanup with that. Another quick hack we used was to store the NSM entries on a standard location on the respective exported filesystem, so that notification is sent once the filesystem comes back online on the destination server and is exported again. BTW, this was not on Linux. It was a simple solution providing the necessary active/active and active/passive cluster support. - Madhan >>> On 6/12/2006 at 9:14:55 pm, in message <448D8BF7.7010105 at redhat.com>, Wendy Cheng wrote: > J. Bruce Fields wrote: > >>On Mon, Jun 12, 2006 at 01:25:43AM -0400, Wendy Cheng wrote: >> >> >>>2. Adding a new flag into "exportfs" command, say "h", such that >>> >>> "exportfs -uh *:/export_path" >>> >>>would un-export the entry and drop the NLM locks associated with the >>>entry. >>> >>> >> >>What does the kernel interface end up looking like in that case? >> >> >> > Happy to see this new exportfs command gets positive response - it was > our original pick too. > > Uploaded is part of a draft version of 2.4 base kernel patch - we're > cleaning up 2.6 patches at this moment. It basically adds a new export > flag (NFSEXP_FOLOCK - note that ex_flags is an int but is currently only > defined up to 16 bits) so nfs-util and kernel can communicate. > > The nice thing about this approach is the recovery part - the take-over > server can use the counter part command to export and set grace period > for one particular interface within the same system call. > > -- Wendy From rpeterso at redhat.com Mon Jun 12 16:27:03 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Mon, 12 Jun 2006 11:27:03 -0500 Subject: [Linux-cluster] Updates to gfs2 user tools Message-ID: <448D95D7.5040403@redhat.com> Hi Folks, Attached are my latest patches to the GFS2 user tools. Summary of the changes: libgfs2 - Porting of more functions from libgfs1 and fsck, mostly for gfs2_convert's sake. edit - Fixed a bug regarding the printing of stuffed directories (e.g. -p masterdir) convert - Got rid of libgfs dependency. Now does everything the libgfs2 way. Also some cleanup and tool standardization. fsck - Made all block values print out in decimal and hex. Moved functions to libgfs2 so gfs2_convert may use them. Change confusing l+f directory to "lost+found" to be compatible with e2fsprogs. man - Added man page for gfs2_convert. Renamed gfs2_mkfs man page to mkfs.gfs2 Deleted mkfs.gfs2 man page references to gulm. Feel free to send questions, comments or concerns. As usual, if I don't hear anything to the contrary, I will commit these changes to CVS in a day or two. (Maybe tomorrow due to build considerations). Regards, Bob Peterson Red Hat Cluster Suite -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: convert.update2.txt URL: From wcheng at redhat.com Mon Jun 12 19:07:23 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 12 Jun 2006 15:07:23 -0400 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: <448DA3F0.AF1C8540@sgi.com> References: <1150089943.26019.18.camel@localhost.localdomain> <448DA3F0.AF1C8540@sgi.com> Message-ID: <448DBB6B.7000408@redhat.com> James Yarbrough wrote: >>2. Adding a new flag into "exportfs" command, say "h", such that >> >> "exportfs -uh *:/export_path" >> >>would un-export the entry and drop the NLM locks associated with the >>entry. >> >> > >This is fine for releasing the locks, but how do you plan to re-enter >the grace period for reclaiming the locks when you relocate the export? >And how do you intend to segregate the export for which reclaims are >valid from the ones which are not? How do you plan to support the >sending of SM_NOTIFY? This might be where a lockd per export has an >advantage. > > > Yeah, that's why Peter's idea (different lockd(s)) is also attractive. However, on the practical side, we don't plan to introduce kernel patches agressively. The approach is to be away from mainline NLM code base until we have enough QA cycles to make sure things work. The unexport part would allow other nfs services on the taken-over server un-interrupted. On the take-over server side, we currently do a global grace period. The plan has been to put a little delay before fixing take-over server's logic due to other NLM/posix lock issues - for example, the current (linux) NLM doesn't bother to call filesystem's lock method (which virtually disables any cluster filesystem's NFS locking across different NFS servers). However, if we have enough resources and/or volunteers, we may do these things in parallel. The following are planned: Take-over server logic: 1. setup the statd sm file (currently /var/lib/nfs/statd/sm or the equivalent configured directory) properly. 2. rpc.statd is dispatched with "--ha-callout" option. 3. implement the ha-callout user mode program to create a seperate statd sm files for each exported ip. 4. export the target filesystem and set up grace period based on fsid (or devno). It will be used in NLM procedure calls by extracting the fsid (or devno) from nfs file handle to decide accepting or reject the not-reclaiming requests. 5. bring up the failover IP address. 6. send SM_NOTIFY to client machines using the configured sm directory created by the ha-callout program (rpc.statd -N -P). Step 4 will be the counter-part of our unexport flag. -- Wendy From wcheng at redhat.com Tue Jun 13 03:39:31 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 12 Jun 2006 23:39:31 -0400 Subject: [NFS] [Linux-cluster] [RFC] NLM lock failover admin interface In-Reply-To: <9A6FE0FCC2B29846824C5CD81C6647B902207776@s228130hz1ew08.apptix-01.savvis.net> References: <9A6FE0FCC2B29846824C5CD81C6647B902207776@s228130hz1ew08.apptix-01.savvis.net> Message-ID: <1150169971.27203.1.camel@localhost.localdomain> On Mon, 2006-06-12 at 09:45 -0500, Stanley, Jon wrote: > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Wendy Cheng > > Sent: Monday, June 12, 2006 12:26 AM > > To: nfs at lists.sourceforge.net > > Cc: linux-cluster at redhat.com > > Subject: [Linux-cluster] [RFC] NLM lock failover admin interface Jon, Thank you for review this - it helps ! -- Wendy > > > > 1. /proc interface, say writing the fsid into a /proc directory entry > > would end up dropping all NLM locks associated with the NFS > > export that > > has fsid in its /etc/exports file. > > This would defintely have it's advantages for people who know what > they're doing - they could drop all locks without unexporting the > filesystem. However, it also gives people the opportunity to shoot > themselves in the foot - by eliminating locks that are needed. After > weighing the pros and cons, I really don't think that any method > accessible via /proc is a good idea. > > > > > 2. Adding a new flag into "exportfs" command, say "h", such that > > > > "exportfs -uh *:/export_path" > > > > would un-export the entry and drop the NLM locks associated with the > > entry. > > > > This is the best of the three, IMHO. Gives you the safety of *knowing* > that the filesystem was unexported before dropping the locks, and > preventing folks from shooting themselves in the foot. > > The other option that was mentioned, a separate lockd for each fs, is > also a good idea - but would require a lot of coding no doubt, and > introduce more instability into what I already preceive as an unstable > NFS subsystem in Linux (I *refuse* to use Linux as an NFS server and > instead go with Solaris - I've had *really* bad experiences with Linux > NFS under load - but that's getting OT). > > > _______________________________________________ > NFS maillist - NFS at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs From fabbione at fabbione.net Tue Jun 13 06:22:51 2006 From: fabbione at fabbione.net (Fabio Massimo Di Nitto) Date: Tue, 13 Jun 2006 08:22:51 +0200 Subject: [Linux-cluster] [PATCH] Fix build failure for gettid.c Message-ID: <448E59BB.3080503@fabbione.net> diff -urNad --exclude=CVS --exclude=.svn ./rgmanager/src/clulib/gettid.c /usr/src/dpatchtemp/dpep-work.JDJ8Uk/redhat-cluster-suite-1.20050706/rgmanager/src/c lulib/gettid.c --- ./rgmanager/src/clulib/gettid.c 2005-06-21 20:07:33.000000000 +0200 +++ /usr/src/dpatchtemp/dpep-work.JDJ8Uk/redhat-cluster-suite-1.20050706/rgmanager/src/clulib/gettid.c 2005-07-06 06:40:22.000000000 +0200 @@ -1,7 +1,9 @@ #include +#include #include #include #include +#include /* Patch from Adam Conrad / Ubuntu: Don't use _syscall macro */ This applies to the above in the STABLE branch and HEAD. The same fix needs to be propagated to cman/qdisk/gettid.c for HEAD (yay for duplicate code ;)). Fabio -- I'm going to make him an offer he can't refuse. From wcheng at redhat.com Tue Jun 13 07:00:11 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Tue, 13 Jun 2006 03:00:11 -0400 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: <17550.11870.186706.36949@cse.unsw.edu.au> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> Message-ID: <1150182012.27203.42.camel@localhost.localdomain> On Tue, 2006-06-13 at 13:17 +1000, Neil Brown wrote: > So: > I think if we really want to "remove all NFS locks on a filesystem", > we could probably tie it into umount - maybe have lockd register some > callback which gets called just before s_op->umount_begin. The "umount_begin" idea was one time on my list but got discarded. The thought was that nfsd was not a filesystem, neither was lockd. How to register something with VFS umount for non-filesystem kernel modules ? Invent another autofs-like pseudo filesystem ? Mostly, not every filesystem would like to get un-mounted upon failover (GFS, for example, does not get un-mounted by our cluster suite upon failover). > If we want to remove all locks that arrived on a particular > interface, then we should arrange to do exactly that. There are a > number of different options here. > One is the multiple-lockd-threads idea. Certainly a good option. To make it happen, we still need admin interface. How to pass IP address from user mode into kernel - care to give this some suggestions if you have them handy ? Should socket ports get dynamics assigned ? Will we have scalibility issues ? > One is to register a callback when an interface is shut down. > Another (possibly the best) is to arrange a new signal for lockd > which say "Drop any locks which were sent to IP addresses that are > no longer valid local addresses". These, again, give individual filesystem no freedom to adjust what they need upon failover. But I'll check them out this week - maybe there are good socket layer hooks that I overlook. > > So those are my thoughts. Do any of them seem reasonable to you? > The comments are greatly appreciated. And hopefully we can reach agreement soon. -- Wendy From mdl at veles.ru Tue Jun 13 08:23:00 2006 From: mdl at veles.ru (Denis Medvedev) Date: Tue, 13 Jun 2006 12:23:00 +0400 Subject: [Linux-cluster] Postgresql under RHCS4 In-Reply-To: References: <4462F013.40201@gmail.com> <446D734C.60308@gmail.com> <44716765.1000000@gmail.com> Message-ID: <448E75E4.1020001@veles.ru> Devrim GUNDUZ wrote: > > Hi, > > On Mon, 22 May 2006, carlopmart wrote: > >> No Devrim, I mean which can be the best form to setup replication >> between master and slave ... and when master goes down, if I put data >> on slave how can I update master node. > > > You don't need a replication system there. Use an SAN :) > > Here is the schema: > > +-------+ +-------+ > |Master | |Slave | > | | | | > |Node | |Node | > +-------+ +-------+ > | | > -------------------------- > | > | > +-------------- > | | > | SAN or NAS | > | | > +-------------+ > > > You will install just operating system,PostgreSQL binaries and Cluster > tools to both Master and Slave nodes. All $PGDATA will reside in the > storage. > > If master node goes down, Slave node will mount $PGDATA and continue > working. When master node is up, slave will umount $PGDATA, stop its > postmaster and will trigger master node and start its postmaster so > that data will not be corrupted. And now we have a SPOF - SAN or NAS... how to get beyond that? > > Regards, > -- > Devrim GUNDUZ > Kivi Bili?im Teknolojileri - http://www.kivi.com.tr > devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr > http://www.gunduz.org > >------------------------------------------------------------------------ > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > From carlopmart at gmail.com Tue Jun 13 08:23:53 2006 From: carlopmart at gmail.com (C. L. Martinez) Date: Tue, 13 Jun 2006 10:23:53 +0200 Subject: [Linux-cluster] Problems with ccsd Message-ID: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com> Hi all, I have setup two rhel4 U3 boxes with rhcs 4. When ccsd process tries to start returns me this error: [root at srvimss1 init.d]# ccsd Failed to connect to cluster manager. Hint: Magma plugins are not in the right spot. How can I fix this?? Where is the problem?? My cluster.conf: -- C.L. Martinez clopmart at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From lvermeiren at core-it.be Tue Jun 13 08:28:02 2006 From: lvermeiren at core-it.be (lvermeiren at core-it.be) Date: Tue, 13 Jun 2006 10:28:02 +0200 Subject: [Linux-cluster] Postgresql under RHCS4 Message-ID: <200606130828.k5D8S2Ut023931@outmx003.isp.belgacom.be> > > On Mon, 22 May 2006, carlopmart wrote: > >> No Devrim, I mean which can be the best form to setup replication > >> between master and slave ... and when master goes down, if I put data > >> on slave how can I update master node. > > > > You don't need a replication system there. Use an SAN :) > And now we have a SPOF - SAN or NAS... how to get beyond that? Redundant power, cabling, switches, controllers + raid >= 1, preferably with some hotspares configured. -------------- next part -------------- An HTML attachment was scrubbed... URL: From riaan at obsidian.co.za Tue Jun 13 09:44:25 2006 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Tue, 13 Jun 2006 11:44:25 +0200 (SAST) Subject: [Linux-cluster] Postgresql under RHCS4 In-Reply-To: <448E75E4.1020001@veles.ru> Message-ID: > > > > You don't need a replication system there. Use an SAN :) > > > > And now we have a SPOF - SAN or NAS... how to get beyond that? > > > If you do not have or want a SAN, you can investigate one of the replicated file systems or block devices. Here are some: - Radiant Data PeerFS (we had a customer running this but had too many problems and had to disable it.) - NetVault Replicator (http://www.bakbone.com/products/replication/) - looks pretty good, but havent tried it myself. They even have a whitepaper on using Replicator with GFS on the above page. - DRBD (Distributed Replicated Block Device) - www.drdb.org and with commercial support at http://www.linbit.com/linhac_drbd.html?L=1 They are my preferred choice (based on a hands-off evaluation) since they are OSS, but the documentation is not quite there yet, relative to the other two. If anyone has had good/bad experiences with these or any other replicated block devices / file systems, myself (and I am sure the original author) would like to hear them. I have a significant number of smaller customers who want HA and RHCS, but a SAN is just not a cost-effective option in those environments. Riaan From gforte at leopard.us.udel.edu Tue Jun 13 12:39:18 2006 From: gforte at leopard.us.udel.edu (Greg Forte) Date: Tue, 13 Jun 2006 08:39:18 -0400 Subject: [Linux-cluster] clurgmgrd logging wierdness Message-ID: <448EB1F6.4000107@leopard.us.udel.edu> Yesterday I added new , , and sections to my cluster.conf (for a failover-able samba service, though I don't think that's relevant). I incremented the version, and ran ccs_tool update and cman_tool version -r. Today I noticed that the only status checks being logged in /var/log/messages were the ones for the smb service on the node running it. Prior to my changes, all status checks were being logged on both nodes. All cluster services were still running properly, but it looks like the status checks on everything but the smb service stopped. After forcing a restart of rgmanager on both nodes, status checks (or at least logging of them) is back to normal. Is this a bug, or am I missing something? -g From cjk at techma.com Tue Jun 13 13:00:46 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Tue, 13 Jun 2006 09:00:46 -0400 Subject: [Linux-cluster] Someone FSCK'd my GFS filesystem.... Message-ID: Ok, this is a doozie, someone used a standard fsck against my GFS filesystem from a single node. I can, for some reason still access the filesystem from other nodes and things look ok. I've never had this happen to my systems before and quite frankly am at a loss as to what my options are. The node from which the fsck was run, hangs when tryng to mount the filesystem so I believe it's a problem with the journals. Is there a way to recover from this other then completely rebuilding the filesystem? Corey -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Tue Jun 13 13:47:53 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Tue, 13 Jun 2006 08:47:53 -0500 Subject: [Linux-cluster] Problems with ccsd In-Reply-To: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com> References: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com> Message-ID: <448EC209.1070905@redhat.com> C. L. Martinez wrote: > Hi all, > > I have setup two rhel4 U3 boxes with rhcs 4. When ccsd process tries > to start returns me this error: > > [root at srvimss1 init.d]# ccsd > Failed to connect to cluster manager. > Hint: Magma plugins are not in the right spot. > > How can I fix this?? Where is the problem?? > > -- > C.L. Martinez > clopmart at gmail.com Hi C.L. I don't know about your particular scenario, but, every time I've gotten this message in the past, it's meant that I build the Red Hat Cluster Suite by hand (i.e. compiling it, not adding it with RPMs or up2date, etc.) and somehow did something wrong. The solution I've used to fix it is: cd cluster; make uninstall; make distclean; ./configure; make install (Assuming the cluster suite source resides in directory "cluster"). If you installed from RPMs or through the GUI, this isn't the solution, in which case you should let me know and I'll see what I can do. I hope this helps. Regards, Bob Peterson Red Hat Cluster Suite From lhh at redhat.com Tue Jun 13 13:39:49 2006 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 13 Jun 2006 09:39:49 -0400 Subject: [Linux-cluster] [PATCH] Fix build failure for gettid.c In-Reply-To: <448E59BB.3080503@fabbione.net> References: <448E59BB.3080503@fabbione.net> Message-ID: <1150205989.20766.227.camel@ayanami.boston.redhat.com> On Tue, 2006-06-13 at 08:22 +0200, Fabio Massimo Di Nitto wrote: > diff -urNad --exclude=CVS --exclude=.svn ./rgmanager/src/clulib/gettid.c > /usr/src/dpatchtemp/dpep-work.JDJ8Uk/redhat-cluster-suite-1.20050706/rgmanager/src/c > lulib/gettid.c > --- ./rgmanager/src/clulib/gettid.c 2005-06-21 20:07:33.000000000 +0200 > +++ > /usr/src/dpatchtemp/dpep-work.JDJ8Uk/redhat-cluster-suite-1.20050706/rgmanager/src/clulib/gettid.c > 2005-07-06 06:40:22.000000000 +0200 > @@ -1,7 +1,9 @@ > #include > +#include > #include > #include > #include > +#include > > /* Patch from Adam Conrad / Ubuntu: Don't use _syscall macro */ Looks good, I'll put it in today. -- Lon From neilb at suse.de Tue Jun 13 03:17:50 2006 From: neilb at suse.de (Neil Brown) Date: Tue, 13 Jun 2006 13:17:50 +1000 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: message from Wendy Cheng on Monday June 12 References: <1150089943.26019.18.camel@localhost.localdomain> Message-ID: <17550.11870.186706.36949@cse.unsw.edu.au> On Monday June 12, wcheng at redhat.com wrote: > NFS v2/v3 active-active NLM lock failover has been an issue with our > cluster suite. With current implementation, it (cluster suite) is trying > to carry the workaround as much as it can with user mode scripts where, > upon failover, on taken-over server, it: > > 1. Tear down virtual IP. > 2. Unexport the subject NFS export. > 3. Signal lockd to drop the locks. > 4. Un-mount filesystem if needed. > ... > we would > like to be able to selectively drop locks (only) associated with the > requested exports without disrupting other NFS services. There seems to be an unstated assumption here that there is one virtual IP per exported filesystem. Is that true? Assuming it is and that I understand properly what you want to do.... I think that maybe the right thing to do is *not* drop the locks on a particular filesystem, but to drop the locks made to a particular virtual IP. Then it would make a lot of sense to have one lockd thread per IP, and signal the lockd in order to drop the locks. True: that might be more code. But if it is the right thing to do, then it should be done that way. On the other hand, I can see a value in removing all the locks for a particular filesytem quite independent of failover requirements. If I want to force-unmount a filesystem, I need to unexport it, and I need to kill all the locks. Currently you can only remove locks from all filesystems, which might not be ideal. I'm not at all keen on the NFSEXP_FOLOCK flag to exp_unexport, as that is an interface that I would like to discard eventually. The preferred mechanism for exporting filesystems is to flush the appropriate 'cache', and allow it to be repopulated with whatever is still valid via upcalls to mountd. So: I think if we really want to "remove all NFS locks on a filesystem", we could probably tie it into umount - maybe have lockd register some callback which gets called just before s_op->umount_begin. If we want to remove all locks that arrived on a particular interface, then we should arrange to do exactly that. There are a number of different options here. One is the multiple-lockd-threads idea. One is to register a callback when an interface is shut down. Another (possibly the best) is to arrange a new signal for lockd which say "Drop any locks which were sent to IP addresses that are no longer valid local addresses". So those are my thoughts. Do any of them seem reasonable to you? NeilBrown From neilb at suse.de Tue Jun 13 07:08:13 2006 From: neilb at suse.de (Neil Brown) Date: Tue, 13 Jun 2006 17:08:13 +1000 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: message from Wendy Cheng on Tuesday June 13 References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150182012.27203.42.camel@localhost.localdomain> Message-ID: <17550.25693.507553.731606@cse.unsw.edu.au> On Tuesday June 13, wcheng at redhat.com wrote: > > One is to register a callback when an interface is shut down. > > Another (possibly the best) is to arrange a new signal for lockd > > which say "Drop any locks which were sent to IP addresses that are > > no longer valid local addresses". > > These, again, give individual filesystem no freedom to adjust what they > need upon failover. But I'll check them out this week - maybe there are > good socket layer hooks that I overlook. > Can you say more about what sort of adjustments an individual filesystem might want the freedom to make? It might help me understand the issues better. Thanks, NeilBrown From brentonr at dorm.org Tue Jun 13 14:10:27 2006 From: brentonr at dorm.org (Brenton Rothchild) Date: Tue, 13 Jun 2006 09:10:27 -0500 Subject: [Linux-cluster] Problems with ccsd In-Reply-To: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com> References: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com> Message-ID: <448EC753.1010505@dorm.org> IIRC, I've seen this when the magma-plugins RPM wasn't installed, if you're using RPMS that is :) -Brenton Rothchild C. L. Martinez wrote: > Hi all, > > I have setup two rhel4 U3 boxes with rhcs 4. When ccsd process tries to > start returns me this error: > > [root at srvimss1 init.d]# ccsd > Failed to connect to cluster manager. > Hint: Magma plugins are not in the right spot. > > How can I fix this?? Where is the problem?? > > My cluster.conf: > > > > > > > > > nodename="srvimss1"/> > > > > > > > nodename="srvimss2"/> > > > > > > > servers="srvmgmt"/> > > > > restricted="1"> > priority="1"/> > priority="2"/> > > restricted="1"> > priority="2"/> > priority="1"/> > > > > > > > -- > C.L. Martinez > clopmart at gmail.com > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From teigland at redhat.com Tue Jun 13 14:17:46 2006 From: teigland at redhat.com (David Teigland) Date: Tue, 13 Jun 2006 09:17:46 -0500 Subject: [Linux-cluster] Someone FSCK'd my GFS filesystem.... In-Reply-To: References: Message-ID: <20060613141746.GA20730@redhat.com> On Tue, Jun 13, 2006 at 09:00:46AM -0400, Kovacs, Corey J. wrote: > Ok, this is a doozie, someone used a standard fsck against my GFS filesystem > from a single node. > I can, for some reason still access the filesystem from other nodes and > things look ok. I've never had > this happen to my systems before and quite frankly am at a loss as to what my > options are. The node > from which the fsck was run, hangs when tryng to mount the filesystem so I > believe it's a problem with > the journals. Is there a way to recover from this other then completely > rebuilding the filesystem? I would unmount all the nodes and have one run gfs_fsck. Dave From lhh at redhat.com Tue Jun 13 14:21:42 2006 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 13 Jun 2006 10:21:42 -0400 Subject: [Linux-cluster] clurgmgrd logging wierdness In-Reply-To: <448EB1F6.4000107@leopard.us.udel.edu> References: <448EB1F6.4000107@leopard.us.udel.edu> Message-ID: <1150208502.20766.230.camel@ayanami.boston.redhat.com> On Tue, 2006-06-13 at 08:39 -0400, Greg Forte wrote: > Yesterday I added new , , and > sections to my cluster.conf (for a failover-able samba service, though I > don't think that's relevant). I incremented the version, and ran > ccs_tool update and cman_tool version -r. > > Today I noticed that the only status checks being logged in > /var/log/messages were the ones for the smb service on the node running > it. Prior to my changes, all status checks were being logged on both > nodes. All cluster services were still running properly, but it looks > like the status checks on everything but the smb service stopped. > > After forcing a restart of rgmanager on both nodes, status checks (or at > least logging of them) is back to normal. > > Is this a bug, or am I missing something? It might be a bug, but there's not enough information to tell right now. Did the services remain in the 'started' state, or did one or more get stopped for some reason after the transition? -- Lon From cjk at techma.com Tue Jun 13 14:29:19 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Tue, 13 Jun 2006 10:29:19 -0400 Subject: [Linux-cluster] Someone FSCK'd my GFS filesystem.... In-Reply-To: <20060613141746.GA20730@redhat.com> Message-ID: Ok, next time I'll take a better look at my logs.... I rebooted the node in question while watching logs of another. problem node booted and logs reflected that gfs wasn't letting it join cuz it was expired (seems this cluster has other problems). I rebooted all nodes and all nodes joined the cluster. So, after examining my fencing config, it appears that my problem node does not have it's fence device configured properly. I can see the filesystem from all nodes and things look OK. I will definitely be doing a "gfs_fsck" against the fs though just in case ,filesystems unmounted of course :) Once more lesson in getting clear details when trying to fix problems induced by others Cheers Corey -----Original Message----- From: David Teigland [mailto:teigland at redhat.com] Sent: Tuesday, June 13, 2006 10:18 AM To: Kovacs, Corey J. Cc: linux-cluster at redhat.com Subject: Re: [Linux-cluster] Someone FSCK'd my GFS filesystem.... On Tue, Jun 13, 2006 at 09:00:46AM -0400, Kovacs, Corey J. wrote: > Ok, this is a doozie, someone used a standard fsck against my GFS > filesystem from a single node. > I can, for some reason still access the filesystem from other nodes > and things look ok. I've never had this happen to my systems before > and quite frankly am at a loss as to what my options are. The node > from which the fsck was run, hangs when tryng to mount the filesystem > so I believe it's a problem with the journals. Is there a way to > recover from this other then completely rebuilding the filesystem? I would unmount all the nodes and have one run gfs_fsck. Dave From gforte at leopard.us.udel.edu Tue Jun 13 14:27:33 2006 From: gforte at leopard.us.udel.edu (Greg Forte) Date: Tue, 13 Jun 2006 10:27:33 -0400 Subject: [Linux-cluster] clurgmgrd logging wierdness In-Reply-To: <1150208502.20766.230.camel@ayanami.boston.redhat.com> References: <448EB1F6.4000107@leopard.us.udel.edu> <1150208502.20766.230.camel@ayanami.boston.redhat.com> Message-ID: <448ECB55.7030701@leopard.us.udel.edu> Lon Hohberger wrote: > On Tue, 2006-06-13 at 08:39 -0400, Greg Forte wrote: >> Yesterday I added new , , and >> sections to my cluster.conf (for a failover-able samba service, though I >> don't think that's relevant). I incremented the version, and ran >> ccs_tool update and cman_tool version -r. >> >> Today I noticed that the only status checks being logged in >> /var/log/messages were the ones for the smb service on the node running >> it. Prior to my changes, all status checks were being logged on both >> nodes. All cluster services were still running properly, but it looks >> like the status checks on everything but the smb service stopped. >> >> After forcing a restart of rgmanager on both nodes, status checks (or at >> least logging of them) is back to normal. >> >> Is this a bug, or am I missing something? > > It might be a bug, but there's not enough information to tell right now. > Did the services remain in the 'started' state, or did one or more get > stopped for some reason after the transition? AFAIK, none of the services ever left the 'started' state, except for the samba service which got started right after the update. In fact, they couldn't have stopped, because if any of them had my monitoring agent on another box would've squawked. -g From malexand at wu-wien.ac.at Tue Jun 13 17:32:46 2006 From: malexand at wu-wien.ac.at (Michael Alexander) Date: Tue, 13 Jun 2006 19:32:46 +0200 Subject: [Linux-cluster] CfP Workshop on XEN in HPC Cluster and Grid Computing Environments (XHPC) Message-ID: =============================================================== CALL FOR PAPERS (XHPC'06) Workshop on XEN in High-Performance Cluster and Grid Computing Environments as part of: The Fourth International Symposium on Parallel and Distributed Processing and Applications (ISPA'2006). Sorrento, Italy =============================================================== Date: 1-4 December 2006 ISPA'2006: http://www.ispa-conference.org/2006/ Workshop URL: http://xhpc.ai.wu-wien.ac.at/ws/ (due date: August 4, Abstracts Jul 17) Scope: The Xen virtual machine monitor is reaching wide-spread adoption in a variety of operating systems as well as scientific educational and operational usage areas. With its low overhead, Xen allows for concurrently running large numbers of virtual machines, providing each encapsulation, isolation and network-wide CPU migratability. Xen offers a network-wide abstraction layer of individual machine resources to OS environments, thereby opening whole new cluster-and grid high-performance computing (HPC) architectures and HPC services options. With Xen finding applications in HPC environments, this workshop aims to bring together researchers and practitioners active on Xen in high-performance cluster and grid computing environments. The workshop will be one day in length, composed of 20 min paper presentations, each followed by 10 min discussion sections. Presentations may be accompanied with interactive demonstrations. The workshop will end with a 30 min panel discussion by presenters. TOPICS Topics include, but are not limited to, the following subject matters: - Xen in cluster and grid environments - Workload characterizations for Xen-based clusters - Xen cluster and grid architectures - Cluster reliability, fault-tolerance, and security - Compute job entry and scheduling - Compute workload load levelling - Cluster and grid filesystems for Xen - Research and education use cases - VM cluster distribution algorithms - MPI, PVM on virtual machines - System sizing - High-speed interconnects in Xen - Xen extensions and utilities for cluster and grid computing - Network architectures for Xen clusters - Xen on large SMP machines - Measuring performance - Performance tuning of Xen domains - Xen performance tuning on various load types - Xen cluster/grid tools - Management of Xen clusters PAPER SUBMISSION Papers submitted to each workshop will be reviewed by at least three members of the program committee and external reviewers. Submissions should include abstract, key words, the e-mail address of the corresponding author, and must not exceed 15 pages, including tables and figures, and preferably be in LaTeX or FrameMaker, although submissions in the LNCS Word format will be accepted as well. Electronic submission through the submission website is strongly encouraged. Hardcopies will be accepted only if electronic submission is not possible. Submission of a paper should be regarded as a commitment that, should the paper be accepted, at least one of the authors will register and attend the conference to present the work. An award for best student paper will be given. http://isda2006.ujn.edu.cn/isda/author/submit.php Format should be according to the Springer LNCS Style http://www.springer.de/comp/lncs/authors.html It is expected that the proceedings of the workshop programs will be published by Springer's LNCS series or IEEE CS. IMPORTANT DATES July 17, 2006 - Abstract submissions due Paper submission due: August 4, 2006 Acceptance notification: September 1, 2006 Camera-ready due: September 20, 2006 Conference: December 1-4, 2006 CHAIR Michael Alexander (chair), WU Vienna, Austria Geyong Min (co-chair), University of Bradford, UK Gudula Ruenger (co-chair), Chemnitz University of Technology, Germany PROGRAM COMMITTEE Franck Cappello, CNRS-Universit? Paris-Sud, France Claudia Eckert, Fraunhofer-Institute, Germany Rob Gardner, HP Labs, USA Marcus Hardt, Forschungszentrum Karlsruhe, Germany Sverre Jarp, CERN, Switzerland Thomas Lange, University of Cologne, Germany Ronald Luijten, IBM Research Laboratory, Zurich, Switzerland Klaus Ita, WU Vienna, Austria Franco Travostino, Nortel CTO Office, USA Andreas Unterkircher, CERN, Switzerland GENERAL INFORMATION This workshop will be held as part of ISPA 2006 in Sorrento, Italy - http://www.sorrentoinfo.com/sorrento/sorrento_italy.asp A pre-conference trip to the ESA ESRIN facility in Frascati on November 30 will be organized. From rpeterso at redhat.com Tue Jun 13 18:09:30 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Tue, 13 Jun 2006 13:09:30 -0500 Subject: [Linux-cluster] New Mailing List: cluster-devel Message-ID: <448EFF5A.9060604@redhat.com> Hi Cluster People, Lately, a few of us have been sending cluster development patches and such to linux-cluster for things like gfs2. We recently decided this was too much "noise" for this mailing list. Still, we wanted to keep everyone here informed and in the development loop so everyone has a chance to contribute and participate. So we created a new public mailing list called "cluster-devel". You can subscribe to cluster-devel from this web page: https://www.redhat.com/mailman/listinfo/cluster-devel All CVS commit messages are automatically sent there and they contain a diff of the changes, and that makes it easier to see the changes and comment on them. I encourage everyone to subscribe to the new cluster-devel mailing list, so you can submit patches, make suggestions, read about the latest development efforts, tell us where we fall short, or stay informed regarding cluster development issues. I'll still use linux-cluster as an open forum for general discussion and solving clustering issues and problems, but I'll try to move my development issues over to cluster-devel. Regards, Bob Peterson Red Hat Cluster Suite From wcheng at redhat.com Wed Jun 14 06:54:51 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Wed, 14 Jun 2006 02:54:51 -0400 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: <17550.11870.186706.36949@cse.unsw.edu.au> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> Message-ID: <1150268091.28264.75.camel@localhost.localdomain> Hi, KABI (kernel application binary interface) commitment is a big thing from our end - so I would like to focus more on the interface agreement before jumping into coding and implementation details. > One is the multiple-lockd-threads idea. Assume we still have this on the table.... Could I expect the admin interface goes thru rpc.lockd command (man page and nfs-util code changes) ? The modified command will take similar options as rpc.statd; more specifically, the -n, -o, and -p (see "man rpc.statd"). To pass the individual IP (socket address) to kernel, we'll need nfsctl with struct nfsctl_svc modified. For the kernel piece, since we're there anyway, could we have the individual lockd IP interface passed to SM (statd) (in SM_MON call) ? This would allow statd to structure its SM files based on each lockd IP address, an important part of lock recovery. > One is to register a callback when an interface is shut down. Haven't checked out (linux) socket interface yet. I'm very fuzzy how this can be done. Anyone has good ideas ? > Another (possibly the best) is to arrange a new signal for lockd > which say "Drop any locks which were sent to IP addresses that are > no longer valid local addresses". Very appealing - but the devil's always in the details. How to decide which IP address is no longer valid ? Or how does lockd know about these IP addresses ? And how to associate one particular IP address with the "struct nlm_file" entries within nlm_files list ? Need few more days to sort this out (or any one already has ideas in mind ?). -- Wendy From riaan at obsidian.co.za Wed Jun 14 11:27:25 2006 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Wed, 14 Jun 2006 13:27:25 +0200 (SAST) Subject: [Linux-cluster] Red Hat Summit presentations Message-ID: For anyone interested in the Red Hat Summit presentations, they are available on-line now. The presentations on Clustering and Storage are available here: http://www.redhat.com/promo/summit/presentations/cns.htm Riaan From hch at infradead.org Wed Jun 14 11:36:05 2006 From: hch at infradead.org (Christoph Hellwig) Date: Wed, 14 Jun 2006 12:36:05 +0100 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: <1150268091.28264.75.camel@localhost.localdomain> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> Message-ID: <20060614113605.GA28158@infradead.org> On Wed, Jun 14, 2006 at 02:54:51AM -0400, Wendy Cheng wrote: > Hi, > > KABI (kernel application binary interface) commitment is a big thing > from our end - so I would like to focus more on the interface agreement > before jumping into coding and implementation details. Please stop this crap now. If zou don't get that there is no kernel internal ABI and there never will be get a different job ASAP. From wcheng at redhat.com Wed Jun 14 13:39:04 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Wed, 14 Jun 2006 09:39:04 -0400 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: <20060614113605.GA28158@infradead.org> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> <20060614113605.GA28158@infradead.org> Message-ID: <1150292344.28264.87.camel@localhost.localdomain> On Wed, 2006-06-14 at 12:36 +0100, Christoph Hellwig wrote: > On Wed, Jun 14, 2006 at 02:54:51AM -0400, Wendy Cheng wrote: > > Hi, > > > > KABI (kernel application binary interface) commitment is a big thing > > from our end - so I would like to focus more on the interface agreement > > before jumping into coding and implementation details. > > Please stop this crap now. If zou don't get that there is no kernel internal > ABI and there never will be get a different job ASAP. Actually I don't quite understand this statement (sorry! English is not my native language) but it is ok. People are entitled for different opinions and I respect yours. On the technical side, just a pre-cautious, in case we need to touch some kernel export symbols so it would be nice to have external (and admin) interfaces decided before we start to code. So I'll not talk about this and I assume we can keep focusing on NLM issues. No more noises from each other. Fair ? -- Wendy From wcheng at redhat.com Wed Jun 14 14:00:54 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Wed, 14 Jun 2006 10:00:54 -0400 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: <1150268091.28264.75.camel@localhost.localdomain> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> Message-ID: <1150293654.28264.91.camel@localhost.localdomain> On Wed, 2006-06-14 at 02:54 -0400, Wendy Cheng wrote: > > Assume we still have this on the table.... Could I expect the admin > interface goes thru rpc.lockd command (man page and nfs-util code > changes) ? The modified command will take similar options as rpc.statd; > more specifically, the -n, -o, and -p (see "man rpc.statd"). To pass the > individual IP (socket address) to kernel, we'll need nfsctl with struct > nfsctl_svc modified. I want to make sure people catch this. Here we're talking about NFS system call interface changes. We need either a new NFS syscall or altering the existing nfsctl_svc structure. -- Wendy > > For the kernel piece, since we're there anyway, could we have the > individual lockd IP interface passed to SM (statd) (in SM_MON call) ? > This would allow statd to structure its SM files based on each lockd IP > address, an important part of lock recovery. > > > One is to register a callback when an interface is shut down. > > Haven't checked out (linux) socket interface yet. I'm very fuzzy how > this can be done. Anyone has good ideas ? > > > Another (possibly the best) is to arrange a new signal for lockd > > which say "Drop any locks which were sent to IP addresses that are > > no longer valid local addresses". > > Very appealing - but the devil's always in the details. How to decide > which IP address is no longer valid ? Or how does lockd know about these > IP addresses ? And how to associate one particular IP address with the > "struct nlm_file" entries within nlm_files list ? Need few more days to > sort this out (or any one already has ideas in mind ?). > > -- Wendy > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rainer at ultra-secure.de Wed Jun 14 15:05:19 2006 From: rainer at ultra-secure.de (Rainer Duffner) Date: Wed, 14 Jun 2006 17:05:19 +0200 Subject: [Linux-cluster] Red Hat Summit presentations In-Reply-To: References: Message-ID: <449025AF.8070602@ultra-secure.de> Riaan van Niekerk wrote: > For anyone interested in the Red Hat Summit presentations, they are > available on-line now. > > The presentations on Clustering and Storage are > available here: http://www.redhat.com/promo/summit/presentations/cns.htm > > Riaan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Hello, The "NFS with Linux"-link doesn't work. Rainer From bobby.m.dalton at nasa.gov Wed Jun 14 17:09:54 2006 From: bobby.m.dalton at nasa.gov (Dalton, Maurice) Date: Wed, 14 Jun 2006 12:09:54 -0500 Subject: [Linux-cluster] Clumanager RHEL3 Message-ID: I am running a 2 system cluster with NFS services. kernel-smp-2.4.21-40.EL clumanager-1.2.31-1 nfs-utils-1.0.6-43EL In my /var/log/message I have several of messages saying: clusvcmgrd[7960]: Starvation on Lock #4! Anyone know what this means? -------------- next part -------------- An HTML attachment was scrubbed... URL: From smartjoe at gmail.com Wed Jun 14 18:49:19 2006 From: smartjoe at gmail.com (jOe) Date: Thu, 15 Jun 2006 02:49:19 +0800 Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun with new fencing mechanisms? Message-ID: Hello all, Sorry if this is a stupid question. I deploy both HP MC/SG linux edition and RHCS for our customers. I just wondered why the latest RHCS remove quorum partition/lock lun with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN switch....)? Lots of our customers choosed HP's sophisticated MC/SG linux edition for their mission critical system in Two Node Cluster Configuration. From our monthly health check service and customers' feedback, i do think HP SGLX is reliable and stable, even under heavy I/O traffic, the lock lun(quorum disk) works pretty good. And the whole cluster architecture is simple and clean, at same time means less issue and problem . I do think Redhat's product team is strong and obviously have their solid reasons to choose new mechanisms in RHCS v4. I've investigated and i can understand that quorum disk/lock lun in two node cluster configuration "Might Bring" more latency and impact the cluster but according to my previous words, i'm sure that it is pretty stable to use lock lun/quorum partition of HP SG/LX even under heavy I/O loads. I have no intention to start a comparison between HP SGLX and RedHat RHCS, All i want to get clear is quorum disk/lock lun Vs RHCS's new fencing mechanisms. Regards, Jun -------------- next part -------------- An HTML attachment was scrubbed... URL: From riaan at obsidian.co.za Wed Jun 14 20:05:40 2006 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Wed, 14 Jun 2006 22:05:40 +0200 (SAST) Subject: [Linux-cluster] Red Hat Summit presentations In-Reply-To: <449025AF.8070602@ultra-secure.de> Message-ID: > > For anyone interested in the Red Hat Summit presentations, they are > > available on-line now. > > The presentations on Clustering and > Storage are > available here: > http://www.redhat.com/promo/summit/presentations/cns.htm > > > Hello, > > The "NFS with Linux"-link doesn't work. > The same paper is available under the Security track, and that link is working. I will pass on the broken link. From lhh at redhat.com Wed Jun 14 21:26:29 2006 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 14 Jun 2006 17:26:29 -0400 Subject: [Linux-cluster] Clumanager RHEL3 In-Reply-To: References: Message-ID: <1150320389.20766.301.camel@ayanami.boston.redhat.com> On Wed, 2006-06-14 at 12:09 -0500, Dalton, Maurice wrote: > I am running a 2 system cluster with NFS services. > > kernel-smp-2.4.21-40.EL > clumanager-1.2.31-1 > nfs-utils-1.0.6-43EL > > In my /var/log/message I have several of messages saying: > > clusvcmgrd[7960]: Starvation on Lock #4! High system load or slow shared storage causing a node to not be able to obtain a lock in a timely manner. It retries, though, so if no other problems occur, it can usually be ignored. -- Lon From akornev at gmail.com Wed Jun 14 22:43:25 2006 From: akornev at gmail.com (Anton Kornev) Date: Thu, 15 Jun 2006 01:43:25 +0300 Subject: [Linux-cluster] GFS locking issues Message-ID: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com> Hi, I have some locking issues (deadlocks?) with GFS. My configuration include 4 hosts - one of them is used as GNBD-device exporter and 3 other import this GNBD partition and mount it to the /gfs mountpoint. LVM is also used on the imported GNBD partition, so clmvd is running. The locking method is DLM, GFS version is 6.1.5, manual fencing used. The problem is quite usual - deadlock on httpd (httpd processess in 'D' state) I saw such problems, though not solutions on the list. In my case apache is placed to the GFS filesystem and I run it inside th chroot by the command like this: chroot /gfs/chroot /usr/local/apache/bin/httpd The problem appears sometimes after "killall httpd" - all the httpd processes get the 'D' state in "ps ax" terms and become locked in this state forever. Moreover the whole GFS filesystem become unavailable after it happens. Even from another host every command that tries to access /gfs partition hangs in the 'D' state. Though last time it was unavailable only partially - the /gfs/chroot/usr hierarchy was "locked" but other parts of gfs worked okay. The only cure I know is to reboot the node and fence it out from the cluster. Is there any ideas of how to fix this? I mean either the reason ('D' state of killed httpd-s) or consequences (the GFS filesystem fully or partially become unavailable after this). I also appreciate any help with debugging the problem. I tried gfs_tool lockdump with decipher_lockstate_dump tool. bash-3.00# ps ax |grep http 14981 ? Ds 0:00 /usr/system/apache/bin/httpd 15242 ? D 0:00 /usr/system/apache/bin/httpd 24708 ? D 0:00 /usr/system/apache/bin/httpd 24709 ? D 0:00 /usr/system/apache/bin/httpd 24710 ? D 0:00 /usr/system/apache/bin/httpd I found only 2 locks regarding these processes: bash-3.00# ls -i /gfs/chroot/lib64/libnss_files-2.3.4.so 27190 /gfs/chroot/lib64/libnss_files-2.3.4.so Glock (inode[2], 27190) gl_flags = lock[1] gl_count = 7 gl_state = shared[3] req_gh = yes req_bh = yes lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = 1 ail_bufs = no Request owner = 24710 gh_state = shared[3] gh_flags = error = 0 gh_iflags = promote[1] holder[6] first[7] Holder owner = 24710 gh_state = shared[3] gh_flags = error = 0 gh_iflags = promote[1] holder[6] first[7] Waiter3 owner = 24708 gh_state = shared[3] gh_flags = error = 0 gh_iflags = promote[1] Waiter3 owner = 24709 gh_state = shared[3] gh_flags = error = 0 gh_iflags = promote[1] Waiter3 owner = 15242 gh_state = shared[3] gh_flags = error = 0 gh_iflags = promote[1] Inode: busy and bash-3.00# ls -i /gfs/chroot/usr/system/apache/bin/httpd 2175961 /gfs/chroot/usr/system/apache/bin/httpd Glock (inode[2], 2175961) gl_flags = gl_count = 4 gl_state = shared[3] req_gh = no req_bh = no lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = 1 ail_bufs = no Holder owner = 14981 gh_state = shared[3] gh_flags = error = 0 gh_iflags = promote[1] holder[6] first[7] Inode: busy There are also such locks for this inodes: Glock (iopen[5], 27190) gl_flags = gl_count = 2 gl_state = shared[3] req_gh = no req_bh = no lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = no ail_bufs = no Holder owner = none[-1] gh_state = shared[3] gh_flags = local_excl[5] exact[7] error = 0 gh_iflags = promote[1] holder[6] first[7] Glock (iopen[5], 2175961) gl_flags = gl_count = 2 gl_state = shared[3] req_gh = no req_bh = no lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = no ail_bufs = no Holder owner = none[-1] gh_state = shared[3] gh_flags = local_excl[5] exact[7] error = 0 gh_iflags = promote[1] holder[6] first[7] During the last hanging the "/gfs/chroot/usr" was unavailable and there are two entries regarding this directory in the lockdump: bash-3.00# ls -di /gfs/chroot/usr/ 15077981 /gfs/chroot/usr/ Glock (inode[2], 15077981) gl_flags = gl_count = 4 gl_state = shared[3] req_gh = no req_bh = no lvb_count = 0 object = yes new_le = no incore_le = no reclaim = yes aspace = 1 ail_bufs = no Inode: num = 15077981/15077981 type = directory[2] i_count = 1 i_flags = vnode = yes Glock (iopen[5], 15077981) gl_flags = gl_count = 2 gl_state = shared[3] req_gh = no req_bh = no lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = no ail_bufs = no Holder owner = none[-1] gh_state = shared[3] gh_flags = local_excl[5] exact[7] error = 0 gh_iflags = promote[1] holder[6] first[7] Your comments will be highly appreciated. -- Best Regards, Anton Kornev. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kanderso at redhat.com Thu Jun 15 02:47:06 2006 From: kanderso at redhat.com (Kevin Anderson) Date: Wed, 14 Jun 2006 21:47:06 -0500 Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun with new fencing mechanisms? In-Reply-To: References: Message-ID: <1150339626.2982.51.camel@localhost.localdomain> On Thu, 2006-06-15 at 02:49 +0800, jOe wrote: > Hello all, > > Sorry if this is a stupid question. > > I deploy both HP MC/SG linux edition and RHCS for our customers. I > just wondered why the latest RHCS remove quorum partition/lock lun > with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN > switch....)? First off, I don't think it is completely fair to compare quorum partitions to fencing. They really serve different purposes. Quorum partition gives you the ability to maintain the cluster through flakey network spikes. It will keep you from prematurely removing nodes from the cluster. Fencing is really used to provide data integrity of your shared storage devices. You really want to make sure that a node is gone before recovering their data. Just because a node isn't updating the quorum partition, doesn't mean it isn't still scrogging your file systems. However, a combination of the two provides a pretty solid cluster in small configurations. And a quorum disk has another nice feature that is useful. That said, a little history before I get to the punch line. Two clustering technologies were merged together for RHCS 4.x releases and the resulting software used the core cluster infrastructure that was part of the GFS product for both RHCS and RHGFS. GFS didn't have a quorum partition as an option primarily due to scalability reasons. The quorum disk works fine for a limited number of nodes, but the core cluster infrastructure needed to be able to scale to large numbers. The fencing mechanisms provide the ability to ensure data integrity in that type of configuration. So, the quorum disk wasn't carried into the new cluster infrastructure at that time. Good news is we realized the deficiency and have added quorum disk support and it will be part of the RHCS4.4 update release which should be hitting the RHN beta sites within a few days. This doesn't replace the need to have a solid fencing infrastructure in place. When a node fails, you still need to ensure that it is gone and won't corrupt the filesystem. Quorum disk will still have scalability issues and is really targeted at small clusters, ie <16 nodes. This is primarily due to having multiple machines pounding on the same storage device. It also provides an additional feature, the ability to represent a configurable number of votes. If you set the quorum device to have the same number of votes as nodes in the cluster. You can maintain cluster sanity down to a single active compute node in the cluster. We can get rid of our funky special two node configuration option. You will then be able to grow a two node cluster without having to reset. Sorry I rambled a bit.. Thanks Kevin From neilb at suse.de Thu Jun 15 04:27:01 2006 From: neilb at suse.de (Neil Brown) Date: Thu, 15 Jun 2006 14:27:01 +1000 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: message from Wendy Cheng on Wednesday June 14 References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> Message-ID: <17552.57749.121240.42384@cse.unsw.edu.au> On Wednesday June 14, wcheng at redhat.com wrote: > Hi, > > KABI (kernel application binary interface) commitment is a big thing > from our end - so I would like to focus more on the interface agreement > before jumping into coding and implementation details. > Before we can agree on an interface, we need to be clear what functionality is required. You started out suggesting that the required functionality was to "remove all locks that lockd holds on a particular filesystem". I responded that I suspect a better functionality was "remove all locks that locked holds on behalf of a particular IP address". You replied that this such an approach > give[s] individual filesystem no freedom to adjust what they > need upon failover. I asked: > Can you say more about what sort of adjustments an individual filesystem > might want the freedom to make? It might help me understand the > issues better. and am still waiting for an answer. Without an answer, I still lean towards and IP-address based approach, and the reply from James Yarbrough seems to support that (though I don't want to read too much into his comments). Lockd is not currently structured to associate locks with server-ip-addresses. There is an assumption that one client may talk to any of the IP addresses that the server supports. This is clearly not the case for the failover scenario that you are considering, so a little restructuring might be in order. Some locks will be held on behalf of a client, no matter what interface the requests arrive on. Other locks will be held on behalf of a client and tied to a particular server IP address. Probably the easiest way to make this distinction in as a new nfsd export flag. So, maybe something like this: Add a 'struct sockaddr_in' to 'struct nlm_file'. If nlm_fopen return (say) 3, then treat is as success, and also copy rqstp->rq_addr into that 'sockaddr_in'. define a new file in the 'nfsd' filesystem into which can be written an IP address and which calls some new lockd function which releases all locks held for that IP address. Probably get nlm_lookup_file to insist that if the sockaddr_in is defined in a lock, it must match the one in rqstp Does that sound OK ? > > One is the multiple-lockd-threads idea. > > Assume we still have this on the table.... Could I expect the admin > interface goes thru rpc.lockd command (man page and nfs-util code > changes) ? The modified command will take similar options as rpc.statd; > more specifically, the -n, -o, and -p (see "man rpc.statd"). To pass the > individual IP (socket address) to kernel, we'll need nfsctl with struct > nfsctl_svc modified. I'm losing interest in the multiple-lockd-threads approach myself (for the moment anyway :-) However I would be against trying to re-use rpc.lockd - that was a mistake that is best forgotten. If the above approach were taken, then I don't think you need anything more than echo aa.bb.cc.dd > /proc/fs/nfsd/vserver_unlock (or whatever), though it you really want to wrap that in a shell script that might be ok. > > For the kernel piece, since we're there anyway, could we have the > individual lockd IP interface passed to SM (statd) (in SM_MON call) ? > This would allow statd to structure its SM files based on each lockd IP > address, an important part of lock recovery. > Maybe.... but I don't get the scenario. Surely the SM files are only needed when the server restarts, and in that case it needs to notify all clients... Or is it that you want to make sure the notification comes from the right IP address.... I guess that would make sense. I that what you are after? > > One is to register a callback when an interface is shut down. > > Haven't checked out (linux) socket interface yet. I'm very fuzzy how > this can be done. Anyone has good ideas ? No good idea, but I have a feeling there is a callback we could use. However I think I am going off this idea. > > > Another (possibly the best) is to arrange a new signal for lockd > > which say "Drop any locks which were sent to IP addresses that are > > no longer valid local addresses". > > Very appealing - but the devil's always in the details. How to decide > which IP address is no longer valid ? Or how does lockd know about these > IP addresses ? And how to associate one particular IP address with the > "struct nlm_file" entries within nlm_files list ? Need few more days to > sort this out (or any one already has ideas in mind ?). See above. NeilBrown From wcheng at redhat.com Thu Jun 15 06:39:24 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Thu, 15 Jun 2006 02:39:24 -0400 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: <17552.57749.121240.42384@cse.unsw.edu.au> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> <17552.57749.121240.42384@cse.unsw.edu.au> Message-ID: <1150353564.4566.89.camel@localhost.localdomain> On Thu, 2006-06-15 at 14:27 +1000, Neil Brown wrote: > You started out suggesting that the required functionality was to > "remove all locks that lockd holds on a particular filesystem". I didn't make this clear. No, we don't want to "remove all locks associated with a particular filesystem". We want to "remove all locks associated with an NFS service" - one NFS service is normally associated with one NFS export. For example, say in /etc/exports: /mnt/export_fs/dir_1 *(fsid=1,async,rw) /mnt/export_fs/dir_2 *(fsid=2,async,rw) One same filesystem (export_fs) is exported via two entries, each with its own fsid. The "fsid" is eventually encoded as part of the filehanlde stored into "struct nlm_file" and linked into nlm_file global list. This is to allow, not only active-active failover (for local filesystem such as ext3), but also load balancing for cluster file systems (such as GFS). In reality, each NFS service is associated with one virtual IP. The failover and load-balancing tasks are carried out by moving the virtual IP around - so I'm ok with the idea of "remove all locks that lockd holds on behalf of a particular IP address". > > Lockd is not currently structured to associate locks with > server-ip-addresses. There is an assumption that one client may talk > to any of the IP addresses that the server supports. This is clearly > not the case for the failover scenario that you are considering, so a > little restructuring might be in order. > > Some locks will be held on behalf of a client, no matter what > interface the requests arrive on. Other locks will be held on behalf > of a client and tied to a particular server IP address. Probably the > easiest way to make this distinction in as a new nfsd export flag. We're very close now - note that I originally proposed adding a new nfsd export flag (NFSEXP_FOLOCKS) so we can OR it into export's ex_flag upon un-export. If the new action flag is set, a new sub-call added into unexport kernel routine will walk thru nlm_file to find the export entry (matched by either fsid or devno, taken from filehandle, within nlm_file struct); then subsequently release the lock. The ex_flag is an "int" but currently only used up to 16 bit. So my new export flag is defined as: NFSEXP_FOLOCKS 0x00010000. > > So, maybe something like this: > > Add a 'struct sockaddr_in' to 'struct nlm_file'. > If nlm_fopen return (say) 3, then treat is as success, and > also copy rqstp->rq_addr into that 'sockaddr_in'. > define a new file in the 'nfsd' filesystem into which can > be written an IP address and which calls some new lockd > function which releases all locks held for that IP address. > Probably get nlm_lookup_file to insist that if the sockaddr_in > is defined in a lock, it must match the one in rqstp Yes, we definitely can do this but there is a "BUT" from our end. What I did in my prototyping code is taking filehandle from nlm_file structure and yank the fsid (or devno) out of it (so we didn't need to know the socket address). With (your) above approach, adding a new field into "struct nlm_file" to hold the sock addr, sadly say, violates our KABI policy. I learnt my lesson. Forget KABI for now. Let me see what you have in the next paragraph (so I can know how to response ...) > > > > > One is the multiple-lockd-threads idea. > > I'm losing interest in the multiple-lockd-threads approach myself (for > the moment anyway :-) > However I would be against trying to re-use rpc.lockd - that was a > mistake that is best forgotten. > If the above approach were taken, then I don't think you need anything > more than > echo aa.bb.cc.dd > /proc/fs/nfsd/vserver_unlock > (or whatever), though it you really want to wrap that in a shell > script that might be ok. This is funny - so we go back to /proc. OK with me :) but you may want to re-think my exportfs command approach. Want me to go over the unexport flow again ? The idea is to add a new user mode flag, say "-h". If you unexport the interface as: shell> exportfs -u *:/export_path // nothing happens, old behavior but if you do: shell> exportfs -hu *:/export_patch // the kernel code would walk thru // nlm_file list to release the // the locks. The "-h" "OR" 0x0001000 into ex_flags field of struct nfsctl_export so kernel can know what to do. With fsid (or devno) in filehandle within nlm_file, we don't need socket address at all. But again, I'm OK with /proc approach. However, with /proc approach, we may need socket address (since not every export uses fsid and devno is not easy to get). Do we agree now ? In simple sentence, I prefer my original "exportfs - hu" approach. But I'm ok with /proc if you insist. > > > > > For the kernel piece, since we're there anyway, could we have the > > individual lockd IP interface passed to SM (statd) (in SM_MON call) ? > > This would allow statd to structure its SM files based on each lockd IP > > address, an important part of lock recovery. > > > > Maybe.... but I don't get the scenario. > Surely the SM files are only needed when the server restarts, and in > that case it needs to notify all clients... Or is it that you want to > make sure the notification comes from the right IP address.... I guess > that would make sense. I that what you are after? Yes ! Right now, lockd doesn't pass the specific server address (that client connects to) to statd. I don't know how the "-H" can ever work. Consider this a bug. If you forget what "rpc.statd -H" is, check out the man page (man rpc.statd). Thank you for the patience - I'm grateful. -- Wendy From neilb at suse.de Thu Jun 15 08:02:48 2006 From: neilb at suse.de (Neil Brown) Date: Thu, 15 Jun 2006 18:02:48 +1000 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: message from Wendy Cheng on Thursday June 15 References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> <17552.57749.121240.42384@cse.unsw.edu.au> <1150353564.4566.89.camel@localhost.localdomain> Message-ID: <17553.5160.366425.740082@cse.unsw.edu.au> On Thursday June 15, wcheng at redhat.com wrote: > On Thu, 2006-06-15 at 14:27 +1000, Neil Brown wrote: > > > You started out suggesting that the required functionality was to > > "remove all locks that lockd holds on a particular filesystem". > > I didn't make this clear. No, we don't want to "remove all locks > associated with a particular filesystem". We want to "remove all locks > associated with an NFS service" - one NFS service is normally associated > with one NFS export. For example, say in /etc/exports: > > /mnt/export_fs/dir_1 *(fsid=1,async,rw) > /mnt/export_fs/dir_2 *(fsid=2,async,rw) That makes sense. > > One same filesystem (export_fs) is exported via two entries, each with > its own fsid. The "fsid" is eventually encoded as part of the filehanlde > stored into "struct nlm_file" and linked into nlm_file global list. > > This is to allow, not only active-active failover (for local filesystem > such as ext3), but also load balancing for cluster file systems (such as > GFS). Could you please explain to me what "active-active failover for local filesystem such as ext3" means (I'm not very familiar with cluster terminology). It sounds like the filesystem is active on two nodes at once, which of course cannot work for ext3, so I am confused. And if you are doing "failover", what has failed? The load-balancing scenario makes sense (at least so far...). > > In reality, each NFS service is associated with one virtual IP. The > failover and load-balancing tasks are carried out by moving the virtual > IP around - so I'm ok with the idea of "remove all locks that lockd > holds on behalf of a particular IP address". > Good. :-) > > > > Lockd is not currently structured to associate locks with > > server-ip-addresses. There is an assumption that one client may talk > > to any of the IP addresses that the server supports. This is clearly > > not the case for the failover scenario that you are considering, so a > > little restructuring might be in order. > > > > Some locks will be held on behalf of a client, no matter what > > interface the requests arrive on. Other locks will be held on behalf > > of a client and tied to a particular server IP address. Probably the > > easiest way to make this distinction in as a new nfsd export flag. > > We're very close now - note that I originally proposed adding a new nfsd > export flag (NFSEXP_FOLOCKS) so we can OR it into export's ex_flag upon > un-export. If the new action flag is set, a new sub-call added into > unexport kernel routine will walk thru nlm_file to find the export entry > (matched by either fsid or devno, taken from filehandle, within nlm_file > struct); then subsequently release the lock. > > The ex_flag is an "int" but currently only used up to 16 bit. So my new > export flag is defined as: NFSEXP_FOLOCKS 0x00010000. > Our two export flags mean VERY different things. Mine says 'locks against this export are per-server-ip-address'. Yours says (I think) 'remove all lockd locks from this export' and is really an unexport flag, not an export flag. And this makes it not really workable. We no-longer require the user of the nfssvc syscall to unexport filesystems. Infact nfs-utils doesn't use it at all if /proc/fs/nfsd is mounted. filesystems are unexported by their entry in the export cache expiring, or the cache being flushed. There is simply no room in the current knfsd design for an unexport flag - sorry ;-( > > > > So, maybe something like this: > > > > Add a 'struct sockaddr_in' to 'struct nlm_file'. > > If nlm_fopen return (say) 3, then treat is as success, and > > also copy rqstp->rq_addr into that 'sockaddr_in'. > > define a new file in the 'nfsd' filesystem into which can > > be written an IP address and which calls some new lockd > > function which releases all locks held for that IP address. > > Probably get nlm_lookup_file to insist that if the sockaddr_in > > is defined in a lock, it must match the one in rqstp > > Yes, we definitely can do this but there is a "BUT" from our end. What I > did in my prototyping code is taking filehandle from nlm_file structure > and yank the fsid (or devno) out of it (so we didn't need to know the > socket address). With (your) above approach, adding a new field into > "struct nlm_file" to hold the sock addr, sadly say, violates our KABI > policy. Does it? 'struct nlm_file' is a structure that is entirely local to lockd. It does not feature in any of the interface between lockd and any other part of the kernel. It is not part of any credible KABI. The other changes I suggest involve adding an exported symbol to lockd, which does change the KABI but in a completely back-compatible way, and re-interpreting the return value of a callout. That could not break any external module - it could only break someone's setup if they had an alternate lockd module, but I don't your KABI policy allows people to replace modules and stay supported, However, as you say.... > > I learnt my lesson. Forget KABI for now. Let me see what you have in the > next paragraph (so I can know how to response ...) > ....we aren't going to let KABI issues get in our way. > > > > > > > > One is the multiple-lockd-threads idea. > > > > I'm losing interest in the multiple-lockd-threads approach myself (for > > the moment anyway :-) > > However I would be against trying to re-use rpc.lockd - that was a > > mistake that is best forgotten. > > If the above approach were taken, then I don't think you need anything > > more than > > echo aa.bb.cc.dd > /proc/fs/nfsd/vserver_unlock > > (or whatever), though it you really want to wrap that in a shell > > script that might be ok. > > This is funny - so we go back to /proc. OK with me :) Only sort-of back to /proc. /proc/fs/nfsd is a separate filesystem which happens to be mounted there normally. The unexport system call goes through this exact same filesystem (though it is somewhat under-the-hood) so at that level, we are really propose the same style of interface implementation. > but you may want > to re-think my exportfs command approach. Want me to go over the > unexport flow again ? The idea is to add a new user mode flag, say "-h". > If you unexport the interface as: > > shell> exportfs -u *:/export_path // nothing happens, old behavior > > but if you do: > > shell> exportfs -hu *:/export_patch // the kernel code would walk thru > // nlm_file list to release the > // the locks. > > The "-h" "OR" 0x0001000 into ex_flags field of struct nfsctl_export so > kernel can know what to do. With fsid (or devno) in filehandle within > nlm_file, we don't need socket address at all. But apart from nfsctl_export being a dead end, this is still exportpoint specific rather than IP address specific. > > But again, I'm OK with /proc approach. However, with /proc approach, we > may need socket address (since not every export uses fsid and devno is > not easy to get). Absolutely. We need a socket address. As part of this process you are shutting down an interface. We know (or can easily discover) the address of that interface. That is exactly the address that we feed to nfsd. > > Do we agree now ? In simple sentence, I prefer my original "exportfs - > hu" approach. But I'm ok with /proc if you insist. > I'm not at an 'insist'ing stage at the moment - I like to at least pretend to be open minded :-) The main thing I don't like about your "exportfs -hu" approach is that I don't think it will work (actually, looking at nfs-utils, I'm not so sure that "exportfs -u" will work at all if you don't have /proc/fs/nfsd mounted....) The other thing I don't like is that it doesn't address your primary need - decommissioning an IP address. Rather it addresses a secondary need - removing some locks from some filesystems. But I'm still open to debate... > > > > > > > > > For the kernel piece, since we're there anyway, could we have the > > > individual lockd IP interface passed to SM (statd) (in SM_MON call) ? > > > This would allow statd to structure its SM files based on each lockd IP > > > address, an important part of lock recovery. > > > > > > > Maybe.... but I don't get the scenario. > > Surely the SM files are only needed when the server restarts, and in > > that case it needs to notify all clients... Or is it that you want to > > make sure the notification comes from the right IP address.... I guess > > that would make sense. I that what you are after? > > Yes ! Right now, lockd doesn't pass the specific server address (that > client connects to) to statd. I don't know how the "-H" can ever work. > Consider this a bug. If you forget what "rpc.statd -H" is, check out the > man page (man rpc.statd). I have to admit I have never given that code a lot of attention. I reviewed when sent it - it seemed to make sense and had no obvious problems - so I accepted it. I wouldn't be enormously surprised if it didn't work in some situations. > > Thank you for the patience - I'm grateful. Ditto. Conversations work much better when people are patient and polite. Thanks, NeilBrown From andros at citi.umich.edu Thu Jun 15 14:07:43 2006 From: andros at citi.umich.edu (William A.(Andy) Adamson) Date: Thu, 15 Jun 2006 10:07:43 -0400 Subject: [NFS] [Linux-cluster] Re: [RFC] NLM lock failover admin interface In-Reply-To: <1150293654.28264.91.camel@localhost.localdomain> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> <1150293654.28264.91.camel@localhost.localdomain> Message-ID: <20060615140743.36CDC1BBAD@citi.umich.edu> this discusion has centered around removing the locks of an export. we also want the interface to ge able to remove the locks owned by a single client. this is needed to enable client migration between replica's or between nodes in a cluster file system. it is not acceptable to place an entire export in grace just to move a small number of clients. -->Andy wcheng at redhat.com said: > On Wed, 2006-06-14 at 02:54 -0400, Wendy Cheng wrote: > > Assume we still have this on the table.... Could I expect the admin > interface goes thru rpc.lockd command (man page and nfs-util code > changes) ? The modified command will take similar options as rpc.statd; > more specifically, the -n, -o, and -p (see "man rpc.statd"). To pass the > individual IP (socket address) to kernel, we'll need nfsctl with struct > nfsctl_svc modified. > > I want to make sure people catch this. Here we're talking about NFS system > call interface changes. We need either a new NFS syscall or altering the > existing nfsctl_svc structure. > -- Wendy From wcheng at redhat.com Thu Jun 15 15:09:41 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Thu, 15 Jun 2006 11:09:41 -0400 Subject: [NFS] [Linux-cluster] Re: [RFC] NLM lock failover admin interface In-Reply-To: <20060615140743.36CDC1BBAD@citi.umich.edu> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> <1150293654.28264.91.camel@localhost.localdomain> <20060615140743.36CDC1BBAD@citi.umich.edu> Message-ID: <44917835.40805@redhat.com> William A.(Andy) Adamson wrote: >this discusion has centered around removing the locks of an export. >we also want the interface to ge able to remove the locks owned by a single >client. this is needed to enable client migration between replica's or between >nodes in a cluster file system. it is not acceptable to place an entire export >in grace just to move a small number of clients. > > > Andy, Gotcha ... forgot about NFS V4. BTW, the discussion has moved back to /proc interface. I agree we need to add one more layer of granularity into it. Glad you caught this flaw. -- Wendy From smartjoe at gmail.com Thu Jun 15 16:30:22 2006 From: smartjoe at gmail.com (jOe) Date: Fri, 16 Jun 2006 00:30:22 +0800 Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun with new fencing mechanisms? In-Reply-To: <1150339626.2982.51.camel@localhost.localdomain> References: <1150339626.2982.51.camel@localhost.localdomain> Message-ID: On 6/15/06, Kevin Anderson wrote: > > On Thu, 2006-06-15 at 02:49 +0800, jOe wrote: > > Hello all, > > > > Sorry if this is a stupid question. > > > > I deploy both HP MC/SG linux edition and RHCS for our customers. I > > just wondered why the latest RHCS remove quorum partition/lock lun > > with the new fencing mechanisms(powerswitch,iLO/DRAC, SAN > > switch....)? > > First off, I don't think it is completely fair to compare quorum > partitions to fencing. They really serve different purposes. Quorum > partition gives you the ability to maintain the cluster through flakey > network spikes. It will keep you from prematurely removing nodes from > the cluster. Fencing is really used to provide data integrity of your > shared storage devices. You really want to make sure that a node is > gone before recovering their data. Just because a node isn't updating > the quorum partition, doesn't mean it isn't still scrogging your file > systems. However, a combination of the two provides a pretty solid > cluster in small configurations. And a quorum disk has another nice > feature that is useful. > > That said, a little history before I get to the punch line. Two > clustering technologies were merged together for RHCS 4.x releases and > the resulting software used the core cluster infrastructure that was > part of the GFS product for both RHCS and RHGFS. GFS didn't have a > quorum partition as an option primarily due to scalability reasons. The > quorum disk works fine for a limited number of nodes, but the core > cluster infrastructure needed to be able to scale to large numbers. The > fencing mechanisms provide the ability to ensure data integrity in that > type of configuration. So, the quorum disk wasn't carried into the new > cluster infrastructure at that time. > > Good news is we realized the deficiency and have added quorum disk > support and it will be part of the RHCS4.4 update release which should > be hitting the RHN beta sites within a few days. This doesn't replace > the need to have a solid fencing infrastructure in place. When a node > fails, you still need to ensure that it is gone and won't corrupt the > filesystem. Quorum disk will still have scalability issues and is > really targeted at small clusters, ie <16 nodes. This is primarily due > to having multiple machines pounding on the same storage device. It > also provides an additional feature, the ability to represent a > configurable number of votes. If you set the quorum device to have the > same number of votes as nodes in the cluster. You can maintain cluster > sanity down to a single active compute node in the cluster. We can get > rid of our funky special two node configuration option. You will then > be able to grow a two node cluster without having to reset. > > Sorry I rambled a bit.. > > Thanks > Kevin > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Thank you very much Kevin, your information is very useful to us and i've shared it to our engineer team. Here are two questions still left: Q1: In a two node cluster config, how does RHCS(v4) handle the heartbeat failed ? (suppose the bonded heartbeat path still failed by some bad situations). When using quorum disk/lock lun, the quorum will act as a tier breaker and solve the brain-split if heartbeat failed. Currently the GFS will do this ? or other part of RHCS? Q2: As you mentioned the quorum disk support is added into RHCS v4.4 update release, so in a two-nodes-cluster config "quorum disk+bonding heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the recommended config from RedHat? Almost 80% cluster requests from our customers are around two-nodes-cluster(10% is RAC and the left is hpc cluster), We really want to provide our customers a simple and solid cluster config in their production environment, Most customer configure their HA cluster as Active/passive so GFS is not necessary to them and they even don't want GFS exists in their two-nodes-cluster system. I do think more and more customers will choose RHCS as their cluster solution and we'll push this after completely understand RHCS's technical benefits and advanced mechanisms. Thanks a lot, Jun -------------- next part -------------- An HTML attachment was scrubbed... URL: From admin.cluster at gmail.com Thu Jun 15 17:05:39 2006 From: admin.cluster at gmail.com (Anthony) Date: Thu, 15 Jun 2006 19:05:39 +0200 Subject: [Linux-cluster] GFS failure Message-ID: <44919363.80806@gmail.com> Hello, yesterday, we had a full GFS system Fail, all partitions were unaccessible from all the 32 nodes. and now all the cluster is inaccessible. did any one had already seen this problem? GFS: Trying to join cluster "lock_gulm", "gen:ir" GFS: fsid=gen:ir.32: Joined cluster. Now mounting FS... GFS: fsid=gen:ir.32: jid=32: Trying to acquire journal lock... GFS: fsid=gen:ir.32: jid=32: Looking at journal... GFS: fsid=gen:ir.32: jid=32: Done NETDEV WATCHDOG: jnet0: transmit timed out ipmi_kcs_sm: kcs hosed: Not in read state for error2 NETDEV WATCHDOG: jnet0: transmit timed out ipmi_kcs_sm: kcs hosed: Not in read state for error2 GFS: fsid=gen:ir.32: fatal: filesystem consistency error GFS: fsid=gen:ir.32: function = trans_go_xmote_bh GFS: fsid=gen:ir.32: file = /usr/src/build/626614-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, line = 542 GFS: fsid=gen:ir.32: time = 1150223491 GFS: fsid=gen:ir.32: about to withdraw from the cluster GFS: fsid=gen:ir.32: waiting for outstanding I/O GFS: fsid=gen:ir.32: telling LM to withdraw From rpeterso at redhat.com Thu Jun 15 18:18:47 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Thu, 15 Jun 2006 13:18:47 -0500 Subject: [Linux-cluster] GFS failure In-Reply-To: <44919363.80806@gmail.com> References: <44919363.80806@gmail.com> Message-ID: <4491A487.6090501@redhat.com> Anthony wrote: > Hello, > > yesterday, > we had a full GFS system Fail, > all partitions were unaccessible from all the 32 nodes. > and now all the cluster is inaccessible. > did any one had already seen this problem? > > > GFS: Trying to join cluster "lock_gulm", "gen:ir" > GFS: fsid=gen:ir.32: Joined cluster. Now mounting FS... > GFS: fsid=gen:ir.32: jid=32: Trying to acquire journal lock... > GFS: fsid=gen:ir.32: jid=32: Looking at journal... > GFS: fsid=gen:ir.32: jid=32: Done > > NETDEV WATCHDOG: jnet0: transmit timed out > ipmi_kcs_sm: kcs hosed: Not in read state for error2 > NETDEV WATCHDOG: jnet0: transmit timed out > ipmi_kcs_sm: kcs hosed: Not in read state for error2 > > GFS: fsid=gen:ir.32: fatal: filesystem consistency error > GFS: fsid=gen:ir.32: function = trans_go_xmote_bh > GFS: fsid=gen:ir.32: file = > /usr/src/build/626614-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, > line = 542 > GFS: fsid=gen:ir.32: time = 1150223491 > GFS: fsid=gen:ir.32: about to withdraw from the cluster > GFS: fsid=gen:ir.32: waiting for outstanding I/O > GFS: fsid=gen:ir.32: telling LM to withdraw Hi Anthony, This problem could be caused by a couple of things. Basically, it indicates a filesystem consistency error occurred. In this particular case, it means that a write was done to the file system, and a transaction lock was taken out, but after the write transaction, the journal for the written data was found to be still in use. That means one of two things: Either (1) some process was writing to the GFS journal when they shouldn't be (i.e. without the necessary lock) or else (2) the journal data written was somehow corrupted on disk. In the past, we've often tracked down such problems to hardware failures; in other words, even without the GFS file system in the loop, if you use a command like 'dd' to send data to the raw hard disk device, then use dd to retrieve it, the data comes back from the hardware different than what was written out. That particular scenario is documented as bugzilla bug 175589. I'm not saying that is your problem, but I'm saying that's what we've seen in the past. My recommendation is to read the bugzilla, back up your entire file system or copy it to a different set of drives, then perhaps you can do some hardware tests as described in the bugzilla to see whether your hardware can consistently write data, read it back, and get a match between what was written and what was read back. Do this test without GFS in there at all, and hopefully with only one node accessing that storage at a time. You will probably also want to run gfs_fsck before mounting again to check the consistency of the file system, just in case some rogue process on one of the nodes was doing something destructive. WARNING: overwriting your GFS file system will of course damage what was there, so you better be careful not to destroy your data and make a copy before doing this. If the hardware checks out 100% and you can recreate the failure, open a bugzilla against GFS and we'll go from there. In other words, we don't know of any problems with GFS that can cause this, beyond hardware problems. I hope this helps. Regards, Bob Peterson Red Hat Cluster Suite From teigland at redhat.com Thu Jun 15 18:26:25 2006 From: teigland at redhat.com (David Teigland) Date: Thu, 15 Jun 2006 13:26:25 -0500 Subject: [Linux-cluster] GFS failure In-Reply-To: <44919363.80806@gmail.com> References: <44919363.80806@gmail.com> Message-ID: <20060615182624.GA1913@redhat.com> On Thu, Jun 15, 2006 at 07:05:39PM +0200, Anthony wrote: > Hello, > > yesterday, > we had a full GFS system Fail, > all partitions were unaccessible from all the 32 nodes. > and now all the cluster is inaccessible. > did any one had already seen this problem? > > > GFS: Trying to join cluster "lock_gulm", "gen:ir" > GFS: fsid=gen:ir.32: Joined cluster. Now mounting FS... > GFS: fsid=gen:ir.32: jid=32: Trying to acquire journal lock... > GFS: fsid=gen:ir.32: jid=32: Looking at journal... > GFS: fsid=gen:ir.32: jid=32: Done > > NETDEV WATCHDOG: jnet0: transmit timed out > ipmi_kcs_sm: kcs hosed: Not in read state for error2 > NETDEV WATCHDOG: jnet0: transmit timed out > ipmi_kcs_sm: kcs hosed: Not in read state for error2 > > GFS: fsid=gen:ir.32: fatal: filesystem consistency error > GFS: fsid=gen:ir.32: function = trans_go_xmote_bh > GFS: fsid=gen:ir.32: file = > /usr/src/build/626614-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, > line = 542 > GFS: fsid=gen:ir.32: time = 1150223491 > GFS: fsid=gen:ir.32: about to withdraw from the cluster > GFS: fsid=gen:ir.32: waiting for outstanding I/O > GFS: fsid=gen:ir.32: telling LM to withdraw This looks like https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164331 which was fixed back in March and should be in the latest rpm's or source tarball. Dave From wcheng at redhat.com Thu Jun 15 18:43:50 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Thu, 15 Jun 2006 14:43:50 -0400 Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface In-Reply-To: <17553.5160.366425.740082@cse.unsw.edu.au> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> <17552.57749.121240.42384@cse.unsw.edu.au> <1150353564.4566.89.camel@localhost.localdomain> <17553.5160.366425.740082@cse.unsw.edu.au> Message-ID: <4491AA66.2050900@redhat.com> Neil Brown wrote: >Could you please explain to me what "active-active failover for local >filesystem such as ext3" means > Clustering is a profilic subject so the term may mean different things to different people. The setup we discuss here is to move an NFS service from one server to the other while both servers are up and running (active-active). The goal is not to disturb other NFS services that are not involved with the transition. >It sounds like the filesystem is active on two nodes at once, which of >course cannot work for ext3, so I am confused. >And if you are doing "failover", what has failed? > >The load-balancing scenario makes sense (at least so far...). > > Local filesystem such as ext3 will never be mounted on more than two nodes but cluster filesystems (e.g. our GFS) will. Moving ext3 normally implies error conditions (a true failover) though in rare cases, it may be kicked off for load balancing purpose. Current GFS locking has the "node-id" concept - the easiest way (at this moment) for virtual IP to float around is to drop the locks and let NLM reclaim the locks from the new server. > >Our two export flags mean VERY different things. >Mine says 'locks against this export are per-server-ip-address'. >Yours says (I think) 'remove all lockd locks from this export' and is >really an unexport flag, not an export flag. > >And this makes it not really workable. We no-longer require the user >of the nfssvc syscall to unexport filesystems. Infact nfs-utils doesn't >use it at all if /proc/fs/nfsd is mounted. filesystems are unexported >by their entry in the export cache expiring, or the cache being >flushed. > > The important thing (for me) is the vfsmount reference count which can only be properly decreased when unexport is triggered. Without decreasing the vfsmount, ext3 can not be un-mounted (and we need to umount ext3 upon failover). I havn't looked into community versions of kernel source for a while (but I'll check). So what can I do to ensure this will happen ? - i.e., after the filesystem has been accessed by nfsd, how can I safely un-mount it without shuting down nfsd (and/or lockd) ? >'struct nlm_file' is a structure that is entirely local to lockd. >It does not feature in any of the interface between lockd and any >other part of the kernel. It is not part of any credible KABI. >The other changes I suggest involve adding an exported symbol to >lockd, which does change the KABI but in a completely back-compatible >way, and re-interpreting the return value of a callout. >That could not break any external module - it could only break >someone's setup if they had an alternate lockd module, but I don't >your KABI policy allows people to replace modules and stay supported, > > Yes, you're right ! I looked into the wrong code (well, it was late in the night so I was not very functional at that moment). Had some prototype code where I transported the nlm_file from one server to another server , experimenting auto-reclaiming locks without stated. I exported the nlm_file list there. So let's forget about this >>>>> One is the multiple-lockd-threads idea. >>>>> >>>>> >>>I'm losing interest in the multiple-lockd-threads approach myself (for >>>the moment anyway :-) >>> >>> Good! because I'm not sure whether we'll hit scalibility issue or not (100 nfs services implies 100 lockd threads !). >>>However I would be against trying to re-use rpc.lockd - that was a >>>mistake that is best forgotten. >>> >>> Highlight this :) ... Give me some comfort feelings that I'm not the only person who would make mistakes. >>>If the above approach were taken, then I don't think you need anything >>>more than >>> echo aa.bb.cc.dd > /proc/fs/nfsd/vserver_unlock >>>(or whatever), though it you really want to wrap that in a shell >>>script that might be ok. >>> >>> >>This is funny - so we go back to /proc. OK with me :) >> >> > >Only sort-of back to /proc. /proc/fs/nfsd is a separate filesystem >which happens to be mounted there normally. >The unexport system call goes through this exact same filesystem >(though it is somewhat under-the-hood) so at that level, we are >really propose the same style of interface implementation. > > >>But again, I'm OK with /proc approach. However, with /proc approach, we >>may need socket address (since not every export uses fsid and devno is >>not easy to get). >> >> > >Absolutely. We need a socket address. >As part of this process you are shutting down an interface. We know >(or can easily discover) the address of that interface. That is >exactly the address that we feed to nfsd. > > Now, it looks good ! Will do the following: 1. Futher understand the steps to make sure we can un-mount ext3 due to "unexport" method changes. 2. Start to code to the /proc interface and make sure "rpc.stated -H"can work (lock reclaiming needs it). Will keep NFS v4 in mind as well. By the way, there is a socket state-change-handler (TCP only) and/or network interface notification routine that seem to be workable (your previous thoughts). However, I don't plan to keep exploring that possibility since we now have a simple and workable method in place. -- Wendy From teigland at redhat.com Thu Jun 15 19:09:59 2006 From: teigland at redhat.com (David Teigland) Date: Thu, 15 Jun 2006 14:09:59 -0500 Subject: [Linux-cluster] GFS locking issues In-Reply-To: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com> References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com> Message-ID: <20060615190959.GB1913@redhat.com> On Thu, Jun 15, 2006 at 01:43:25AM +0300, Anton Kornev wrote: > Is there any ideas of how to fix this? I mean either the reason ('D' > state of killed httpd-s) or consequences (the GFS filesystem fully or > partially become unavailable after this). > > I also appreciate any help with debugging the problem. > > I tried gfs_tool lockdump with decipher_lockstate_dump tool. I don't see anything wrong in the lockdumps you gave, although I'm not an expert at interpreting gfs lockdumps. Could you do a ps showing the wchan for those processes? Using sysrq to get a stack dump would also be useful. You might also do a dlm lock dump and pick out those locks: echo "lockspace name" >> /proc/cluster/dlm_locks cat /proc/cluster/dlm_locks I/O stuck in gnbd could also be a problem, I'm not sure what the signs of that might be apart from possibly the wchan. Dave From kanderso at redhat.com Thu Jun 15 19:38:34 2006 From: kanderso at redhat.com (Kevin Anderson) Date: Thu, 15 Jun 2006 14:38:34 -0500 Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun with new fencing mechanisms? In-Reply-To: References: <1150339626.2982.51.camel@localhost.localdomain> Message-ID: <1150400314.2810.34.camel@localhost.localdomain> On Fri, 2006-06-16 at 00:30 +0800, jOe wrote: > > Thank you very much Kevin, your information is very useful to us and > i've shared it to our engineer team. > Here are two questions still left: > Q1: In a two node cluster config, how does RHCS(v4) handle the > heartbeat failed ? (suppose the bonded heartbeat path still failed by > some bad situations). Current configuration requires using power fencing when running the special case two node cluster. If you lose heartbeat between the two machines, both nodes will attempt to fence the other node. The node that wins the fencing race gets to stay up, the other node is reset and won't be able to re-establish quorum until connectivity is restored. > When using quorum disk/lock lun, the quorum will act as a tier breaker > and solve the brain-split if heartbeat failed. Currently the GFS will > do this ? or other part of RHCS? Quorum support is integrated in the core cluster infrastructure so is usable with just RHCS. You do not need GFS to use a quorum disk. > > Q2: As you mentioned the quorum disk support is added into RHCS v4.4 > update release, so in a two-nodes-cluster config "quorum disk+bonding > heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the > recommended config from RedHat? Almost 80% cluster requests from our > customers are around two-nodes-cluster(10% is RAC and the left is hpc > cluster), We really want to provide our customers a simple and solid > cluster config in their production environment, Most customer > configure their HA cluster as Active/passive so GFS is not necessary > to them and they even don't want GFS exists in their two-nodes-cluster > system. If you have access to shared storage, then a two node cluster with quorum disk/fencing would be a better configuration and could be the recommended configuration. However, there are still cases where you could have a two node cluster with no shared storage. Depends on how the application is sharing state or accessing data. But for an active/passive two node failover cluster, I can see where the quorum disk will be very popular. Kevin From smartjoe at gmail.com Thu Jun 15 19:51:01 2006 From: smartjoe at gmail.com (jOe) Date: Fri, 16 Jun 2006 03:51:01 +0800 Subject: [Linux-cluster] Why Redhat replace quorum partition/lock lun with new fencing mechanisms? In-Reply-To: <1150400314.2810.34.camel@localhost.localdomain> References: <1150339626.2982.51.camel@localhost.localdomain> <1150400314.2810.34.camel@localhost.localdomain> Message-ID: On 6/16/06, Kevin Anderson wrote: > > On Fri, 2006-06-16 at 00:30 +0800, jOe wrote: > > > > > > Thank you very much Kevin, your information is very useful to us and > > i've shared it to our engineer team. > > Here are two questions still left: > > Q1: In a two node cluster config, how does RHCS(v4) handle the > > heartbeat failed ? (suppose the bonded heartbeat path still failed by > > some bad situations). > > Current configuration requires using power fencing when running the > special case two node cluster. If you lose heartbeat between the two > machines, both nodes will attempt to fence the other node. The node > that wins the fencing race gets to stay up, the other node is reset and > won't be able to re-establish quorum until connectivity is restored. > > > When using quorum disk/lock lun, the quorum will act as a tier breaker > > and solve the brain-split if heartbeat failed. Currently the GFS will > > do this ? or other part of RHCS? > > Quorum support is integrated in the core cluster infrastructure so is > usable with just RHCS. You do not need GFS to use a quorum disk. > > > > > Q2: As you mentioned the quorum disk support is added into RHCS v4.4 > > update release, so in a two-nodes-cluster config "quorum disk+bonding > > heartbeat+fencing(powerswitch or iLO/DRAC) (no GFS)" is the > > recommended config from RedHat? Almost 80% cluster requests from our > > customers are around two-nodes-cluster(10% is RAC and the left is hpc > > cluster), We really want to provide our customers a simple and solid > > cluster config in their production environment, Most customer > > configure their HA cluster as Active/passive so GFS is not necessary > > to them and they even don't want GFS exists in their two-nodes-cluster > > system. > > If you have access to shared storage, then a two node cluster with > quorum disk/fencing would be a better configuration and could be the > recommended configuration. However, there are still cases where you > could have a two node cluster with no shared storage. Depends on how > the application is sharing state or accessing data. But for an > active/passive two node failover cluster, I can see where the quorum > disk will be very popular. > > Kevin > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Thank you very much. Jun -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlaurenz at advance.net Tue Jun 13 14:15:29 2006 From: vlaurenz at advance.net (Vito Laurenza) Date: Tue, 13 Jun 2006 10:15:29 -0400 Subject: [Linux-cluster] Send notification when a node is fenced? Message-ID: <448EC881.5030509@advance.net> Would someone kindly tell me how to configure cluster suite to notify (via email) when a node has been fenced or when a node leaves or joins a cluster? I can't seem to find any documentation on this. Thanks! From jmy at lolita.engr.sgi.com Tue Jun 13 15:23:44 2006 From: jmy at lolita.engr.sgi.com (James Yarbrough) Date: Tue, 13 Jun 2006 08:23:44 -0700 (PDT) Subject: [Linux-cluster] Re: [NFS] [RFC] NLM lock failover admin interface References: <1150089943.26019.18.camel@localhost.localdomain> Message-ID: <200606131523.k5DFNila1061570@lolita.engr.sgi.com> > There seems to be an unstated assumption here that there is one > virtual IP per exported filesystem.?? Is that true? This is the normal case for such HA services. There may actually be a single IP address covering multiple filesystems and/or NFS exports. > I think that maybe the right thing to do is *not* drop the locks on a > particular filesystem, but to drop the locks made to a particular > virtual IP. For filesystems such as ext2 or xfs, you unmount the filesystem on the current server and mount it on the new server when doing a failover. In this case, you have to be able to get rid of all the locks first and you do that for the entire filesystem. For a cluster filesystem such as cxfs, you don't actually unmount the filesystem, so you really need the per-IP address approach. > If I want to force-unmount a filesystem, I need to unexport it, and I > need to kill all the locks.?? Currently you can only remove locks from > all filesystems, which might not be ideal. This is definitely less than ideal. This will force notification and reclaim for all exported filesystems. This can be a significant problem. jmy at sgi.com 650 933 3124 Why is there a snake in my Coke? From gradimir_starovic at symantec.com Wed Jun 14 12:40:37 2006 From: gradimir_starovic at symantec.com (Gradimir Starovic) Date: Wed, 14 Jun 2006 13:40:37 +0100 Subject: [Linux-cluster] Red Hat Summit presentations Message-ID: "NFS for Linux" link gives Page not found. Is it just me or it needs fixing? regards Gradimir > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Riaan > van Niekerk > Sent: 14 June 2006 12:27 > To: linux-cluster at redhat.com > Subject: [Linux-cluster] Red Hat Summit presentations > > For anyone interested in the Red Hat Summit presentations, > they are available on-line now. > > The presentations on Clustering and Storage are available > here: http://www.redhat.com/promo/summit/presentations/cns.htm > > Riaan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From bmarzins at redhat.com Thu Jun 15 20:52:00 2006 From: bmarzins at redhat.com (Benjamin Marzinski) Date: Thu, 15 Jun 2006 15:52:00 -0500 Subject: [Linux-cluster] GFS locking issues In-Reply-To: <20060615190959.GB1913@redhat.com> References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com> <20060615190959.GB1913@redhat.com> Message-ID: <20060615205200.GC12574@ether.msp.redhat.com> On Thu, Jun 15, 2006 at 02:09:59PM -0500, David Teigland wrote: > On Thu, Jun 15, 2006 at 01:43:25AM +0300, Anton Kornev wrote: > > > Is there any ideas of how to fix this? I mean either the reason ('D' > > state of killed httpd-s) or consequences (the GFS filesystem fully or > > partially become unavailable after this). > > > > I also appreciate any help with debugging the problem. > > > > I tried gfs_tool lockdump with decipher_lockstate_dump tool. > > I don't see anything wrong in the lockdumps you gave, although I'm not an > expert at interpreting gfs lockdumps. Could you do a ps showing the wchan > for those processes? Using sysrq to get a stack dump would also be useful. > You might also do a dlm lock dump and pick out those locks: > echo "lockspace name" >> /proc/cluster/dlm_locks > cat /proc/cluster/dlm_locks > > I/O stuck in gnbd could also be a problem, I'm not sure what the signs of > that might be apart from possibly the wchan. To check for GNBD lockups, there are a couple of useful places to look. Are there any messages in the logs of any of the nodes (particularly the hanging gnbd client and the gnbd server node) that provide any clues? Do a # gnbd_import -l on all the gnbd client machines. The 'State:' line is the important one. For all the devices you are using, the first to values should be "Open" and "Connected". If it doesn't say "Connected" you've lost connection to the server for some reason. The log messages should provide a clue. If the last value says "Clear", then there is no outstanding IO to the server. If it says "Pending", do a # cat /sys/class/gnbd/gnbd/waittime Run the command a couple of times. This is the time since the server has last fulfilled an outstanding request. If there are no oustanding requests, it will be -1. If the value keeps getting larger, then there is pending IO to the server. Run # gnbd_export -L on the server machine. You should see a process for each exported device for each client. If there is pending IO to the server, a stack trace of the server process will show where it's stuck. The other place GNBD could be stuck is waiting on some internal lock. A stack should point that out. -Ben > > Dave > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From neilb at suse.de Fri Jun 16 06:09:49 2006 From: neilb at suse.de (Neil Brown) Date: Fri, 16 Jun 2006 16:09:49 +1000 Subject: [NFS] [Linux-cluster] Re: [RFC] NLM lock failover admin interface In-Reply-To: message from William A.(Andy) Adamson on Thursday June 15 References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> <1150293654.28264.91.camel@localhost.localdomain> <20060615140743.36CDC1BBAD@citi.umich.edu> Message-ID: <17554.19245.914383.436585@cse.unsw.edu.au> On Thursday June 15, andros at citi.umich.edu wrote: > this discusion has centered around removing the locks of an export. > we also want the interface to ge able to remove the locks owned by a single > client. this is needed to enable client migration between replica's or between > nodes in a cluster file system. it is not acceptable to place an entire export > in grace just to move a small number of clients. Hmmmm.... You want to remove all the locks owned by a particular client with the intension of reclaiming those locks against a different NFS server (on a cluster filesystem) and you don't want to put the whole filesystem into grace mode while doing it. Is that correct? Sounds extremely racy to me. Suppose some other client takes a conflicting lock between dropping them on one server and claiming them on the other? That would be bad. The purpose of the grace mode is precisely to avoid this sort of race. It would seem that what you "really" want to do is to tell the cluster filesystem to migrate the locks to a different node and some how tell lockd about out. Is there a comprehensive design document about how this is going to work, because I'm feeling doubtful. For the 'between replicas' case - I'm not sure locking makes sense. Locking on a read-only filesystem is pretty pointless, and presumably replicas are read-only??? Basically, dropping locks that are expected to be picked up again, without putting the whole filesystem into a grace period simply doesn't sound workable to me. Am I missing something? NeilBrown From akornev at gmail.com Fri Jun 16 15:37:14 2006 From: akornev at gmail.com (Anton Kornev) Date: Fri, 16 Jun 2006 18:37:14 +0300 Subject: [Linux-cluster] GFS locking issues In-Reply-To: <20060615190959.GB1913@redhat.com> References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com> <20060615190959.GB1913@redhat.com> Message-ID: <433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com> David, Benjamin, thanks for you assistance! I reproduced the problem and I have done the tests you mentioned. Regarding gndb: gnbd_import -l tool reports "Open, Connected" state and gndb_export -L on the gnbd server also shows all the hosts importing this partition. The " cat /sys/class/gnbd/gnbd0/waittime" also shows no data pending (returns -1). Though in the message log there were some strange lines about gnbd failures appeared after the "killall httpd" command was issued: gnbd (pid 5836: alogc.pl) got signal 9 gnbd0: Send control failed (result -4) gnbd (pid 5836: alogc.pl) got signal 15 gnbd0: Send control failed (result -4) gnbd (pid 5911: httpd) got signal 15 gnbd0: Send control failed (result -4) gnbd (pid 5897: httpd) got signal 15 gnbd0: Send control failed (result -4) gnbd (pid 5915: httpd) got signal 15 gnbd0: Send control failed (result -4) gnbd (pid 5911: httpd) got signal 15 gnbd0: Send control failed (result -4) Regarding ps info on wchan - it looks like this: ps axl info on IO-waiting processes: F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 1 0 51 6 15 0 0 0 wait_o D ? 0:00 [pdflush] 1 0 5771 6 5 -10 0 0 lock_p D< ? 0:00 [lock_dlm1] 1 0 5776 1 15 0 0 0 - D ? 0:00 [gfs_logd] 1 0 5777 1 15 0 0 0 - D ? 0:00 [gfs_quotad] 1 0 5778 1 15 0 0 0 - D ? 0:00 [gfs_inoded] 5 0 5892 1 16 0 23440 912 - Ds ? 0:00 /usr/system/apache/bin/httpd 5 48 5895 5892 17 0 23472 984 glock_ D ? 0:00 /usr/system/apache/bin/httpd 5 48 5896 5892 17 0 23440 980 glock_ D ? 0:00 /usr/system/apache/bin/httpd 5 48 5897 5892 17 0 23440 920 glock_ D ? 0:00 /usr/system/apache/bin/httpd 5 48 5911 5892 17 0 23440 920 glock_ D ? 0:00 /usr/system/apache/bin/httpd 5 48 5915 5892 17 0 23440 920 wait_o D ? 0:00 /usr/system/apache/bin/httpd 4 0 5930 2547 34 19 52780 992 wait_o DN ? 0:00 /bin/sh -c run-parts /etc/cron.da ily Not truncated version of the "wchan" field for all the IO-waiting processes is below: bash-3.00# ps ax -o pid,state,wchan:32,ucomm |grep D PID S WCHAN COMMAND 51 D wait_on_buffer pdflush 5771 D lock_page lock_dlm1 5776 D - gfs_logd 5777 D - gfs_quotad 5778 D - gfs_inoded 5892 D - httpd 5895 D glock_wait_internal httpd 5896 D glock_wait_internal httpd 5897 D glock_wait_internal httpd 5911 D glock_wait_internal httpd 5915 D wait_on_buffer httpd 5930 D wait_on_buffer sh Finally I have taken the "sysrq" info on these processes. pdflush D ffffffff8014aabc 0 51 6 53 50 (L-TLB) 00000100dfc3dc78 0000000000000046 000001011bd3e980 000001010fc11f00 0000000000000216 ffffffffa0042916 000001011aca60c0 0000000000000008 000001011fdef7f0 0000000000000dfa Call Trace:{:dm_mod:dm_request+396} {keventd_create_kthread+0} {io_schedule+38} {__wait_on_buffer+125} {bh_wake_function+0} {bh_wake_function+0} {:gfs:gfs_logbh_wait+49} {:gfs:disk_commit+794} {:gfs:log_refund+111} {:gfs:log_flush_internal+510} {sync_supers+167} {wb_kupdate+36} {pdflush+323} {wb_kupdate+0} {pdflush+0} {kthread+200} {child_rip+8} {keventd_create_kthread+0} {kthread+0} {child_rip+0} lock_dlm1 D 000001000c0096e0 0 5771 6 5772 5766 (L-TLB) 0000010113ce3c58 0000000000000046 0000001000000000 0000010000000069 000001011420b030 0000000000000069 000001000c00a940 000000010000eb10 000001011a887030 0000000000001cae Call Trace:{__generic_unplug_device+19} {io_schedule+38} {__lock_page+191} {page_wake_function+0} {page_wake_function+0} {truncate_inode_pages+519} {:gfs:gfs_inval_page+63} {:gfs:drop_bh+233} {:gfs:gfs_glock_cb+194} {:lock_dlm:dlm_async+1989} {default_wake_function+0} {keventd_create_kthread+0} {:lock_dlm:dlm_async+0} {keventd_create_kthread+0} {kthread+200} {child_rip+8} {keventd_create_kthread+0} {kthread+0} {child_rip+0} gfs_logd D 0000000000000000 0 5776 1 5777 5775 (L-TLB) 000001011387fe38 0000000000000046 0000000000000000 ffffffff80304a85 000001011387fe58 ffffffff80304add ffffffff803cca80 0000000000000246 00000101143fe030 00000000000000b5 Call Trace:{thread_return+0} {thread_return+88} {:gfs:lock_on_glock+112} {__down_write+134} {:gfs:gfs_ail_empty+56} {:gfs:gfs_logd+77} {child_rip+8} {dummy_d_instantiate+0} {:gfs:gfs_logd+0} {child_rip+0} gfs_quotad D 0000000000000000 0 5777 1 5778 5776 (L-TLB) 0000010113881e98 0000000000000046 0000000000000000 ffffffff80304a85 0000010113881eb8 ffffffff80304add 000001011ff87030 0000000100000074 000001011430f7f0 0000000000000128 Call Trace:{thread_return+0} {thread_return+88} {__down_write+134} {:gfs:gfs_quota_sync+226} {:gfs:gfs_quotad+127} {child_rip+8} {dummy_d_instantiate+0} {dummy_d_instantiate+0} {dummy_d_instantiate+0} {:gfs:gfs_quotad+0} {child_rip+0} gfs_inoded D 0000000000000000 0 5778 1 5807 5777 (L-TLB) 0000010113883e98 0000000000000046 000001011e2937f0 000001000c0096e0 0000000000000000 ffffffff80304a85 0000010113883ec8 0000000180304add 000001011e2937f0 00000000000000c2 Call Trace:{thread_return+0} {__down_write+134} {:gfs:unlinked_find+115} {:gfs:gfs_unlinked_dealloc+25} {:gfs:gfs_inoded+66} {child_rip+8} {:gfs:gfs_inoded+0} {child_rip+0} httpd D ffffffff80304190 0 5892 1 5893 5826 (NOTLB) 0000010111b75bf8 0000000000000002 0000000000000001 0000000000000001 0000000000000000 0000000000000000 0000010114667980 0000000111b75bc0 00000101143fe7f0 00000000000009ad Call Trace:{__down+147} {default_wake_function+0} {generic_file_write_nolock+158} {__down_failed+53} {:gfs:.text.lock.dio+95} {:gfs:gfs_trans_add_bh+205} {:gfs:do_write_buf+1138} {:gfs:walk_vm+278} {:gfs:do_write_buf+0} {:gfs:do_write_buf+0} {:gfs:__gfs_write+201} {vfs_write+207} {sys_write+69} {system_call+126} httpd D 0000010110ad7d48 0 5895 5892 5896 5893 (NOTLB) 0000010110ad7bd8 0000000000000006 000001011b16e030 0000000000000075 0000010117002030 0000000000000075 000001000c002940 0000000000000001 00000101170027f0 000000000001300e Call Trace:{try_to_wake_up+863} {wait_for_completion+167} {default_wake_function+0} {default_wake_function+0} {:gfs:glock_wait_internal+350} {:gfs:gfs_glock_nq+961} {:gfs:gfs_glock_nq_init+20} {:gfs:gfs_private_nopage+84} {do_no_page+1003} {do_wp_page+948} {handle_mm_fault+343} {get_signal_to_deliver+1118} {do_page_fault+518} {thread_return+0} {thread_return+88} {error_exit+0} httpd D 0000010110b5bd48 0 5896 5892 5897 5895 (NOTLB) 0000010110b5bbd8 0000000000000002 00000101170027f0 0000000000000075 00000101114787f0 0000000000000075 000001000c002940 0000000000000001 0000010117002030 000000000000fb3e Call Trace:{try_to_wake_up+863} {wait_for_completion+167} {default_wake_function+0} {default_wake_function+0} {:gfs:glock_wait_internal+350} {:gfs:gfs_glock_nq+961} {:gfs:gfs_glock_nq_init+20} {:gfs:gfs_private_nopage+84} {do_no_page+1003} {do_wp_page+948} {handle_mm_fault+343} {get_signal_to_deliver+1118} {do_page_fault+518} {sys_accept+327} {pipe_read+26} {error_exit+0} httpd D 0000000000000000 0 5897 5892 5911 5896 (NOTLB) 0000010110119bd8 0000000000000006 0000010117002030 0000000000000075 0000010117002030 0000000000000075 000001000c00a940 000000001b16e030 00000101114787f0 000000000000fbe0 Call Trace:{__generic_unplug_device+19} {wait_for_completion+167} {default_wake_function+0} {default_wake_function+0} {:gfs:glock_wait_internal+350} {:gfs:gfs_glock_nq+961} {:gfs:gfs_glock_nq_init+20} {:gfs:gfs_private_nopage+84} {do_no_page+1003} {do_wp_page+948} {handle_mm_fault+343} {get_signal_to_deliver+1118} {do_page_fault+518} {thread_return+0} {thread_return+88} {error_exit+0} httpd D 00000101100c3d48 0 5911 5892 5915 5897 (NOTLB) 00000101100c3bd8 0000000000000002 000001011420b7f0 0000000000000075 00000101170027f0 0000000000000075 000001000c002940 0000000000000000 000001011b16e030 000000000000187e Call Trace:{try_to_wake_up+863} {wait_for_completion+167} {default_wake_function+0} {default_wake_function+0} {:gfs:glock_wait_internal+350} {:gfs:gfs_glock_nq+961} {:gfs:gfs_glock_nq_init+20} {:gfs:gfs_private_nopage+84} {do_no_page+1003} {do_wp_page+948} {handle_mm_fault+343} {get_signal_to_deliver+1118} {do_page_fault+518} {thread_return+0} {thread_return+88} {error_exit+0} httpd D 0000000000006a36 0 5915 5892 5911 (NOTLB) 00000101180f7ad8 0000000000000006 0000000000002706 ffffffffa020c791 0000000000000000 0000000000000000 0000030348ac8c1c 0000000114a217f0 0000010114c997f0 000000000000076a Call Trace:{:dlm:lkb_swqueue+43} {io_schedule+38} {__wait_on_buffer+125} {bh_wake_function+0} {bh_wake_function+0} {:gfs:gfs_dreread+154} {:gfs:gfs_dread+40} {:gfs:gfs_get_meta_buffer+201} {:gfs:gfs_copyin_dinode+23} {:gfs:inode_go_lock+38} {:gfs:glock_wait_internal+563} {:gfs:gfs_glock_nq+961} {:gfs:gfs_glock_nq_init+20} {:gfs:gfs_private_nopage+84} {do_no_page+1003} {do_wp_page+948} {handle_mm_fault+343} {get_signal_to_deliver+1118} {do_page_fault+518} {thread_return+0} {thread_return+88} {error_exit+0} sh D 000000000000001a 0 5930 2547 (NOTLB) 000001011090f8e8 0000000000000002 0000010111293d88 0000010110973d00 0000010111293d88 0000000000000000 00000100dfc02400 0000000000010000 00000101148557f0 0000000000002010 Call Trace:{io_schedule+38} {__wait_on_buffer+125} {bh_wake_function+0} {bh_wake_function+0} {:gfs:gfs_dreread+154} {:gfs:gfs_dread+40} {:gfs:gfs_get_meta_buffer+201} {:gfs:gfs_copyin_dinode+23} {:gfs:inode_go_lock+38} {:gfs:glock_wait_internal+563} {:gfs:gfs_glock_nq+961} {:gfs:gfs_glock_nq_init+20} {dummy_inode_permission+0} {:gfs:gfs_permission+64} {dput+56} {permission+51} {__link_path_walk+372} {link_path_walk+82} {do_page_fault+575} {__link_path_walk+1658} {link_path_walk+82} {do_page_fault+575} {path_lookup+451} {__user_walk+47} {vfs_stat+24} {do_page_fault+575} {sys_newstat+17} {error_exit+0} {system_call+126} Please, let me know if it gives you any clues. On 6/15/06, David Teigland wrote: > > On Thu, Jun 15, 2006 at 01:43:25AM +0300, Anton Kornev wrote: > > > Is there any ideas of how to fix this? I mean either the reason ('D' > > state of killed httpd-s) or consequences (the GFS filesystem fully or > > partially become unavailable after this). > > > > I also appreciate any help with debugging the problem. > > > > I tried gfs_tool lockdump with decipher_lockstate_dump tool. > > I don't see anything wrong in the lockdumps you gave, although I'm not an > expert at interpreting gfs lockdumps. Could you do a ps showing the wchan > for those processes? Using sysrq to get a stack dump would also be > useful. > You might also do a dlm lock dump and pick out those locks: > echo "lockspace name" >> /proc/cluster/dlm_locks > cat /proc/cluster/dlm_locks > > I/O stuck in gnbd could also be a problem, I'm not sure what the signs of > that might be apart from possibly the wchan. > > Dave > > -- Best Regards, Anton Kornev. -------------- next part -------------- An HTML attachment was scrubbed... URL: From andros at citi.umich.edu Fri Jun 16 15:39:04 2006 From: andros at citi.umich.edu (William A.(Andy) Adamson) Date: Fri, 16 Jun 2006 11:39:04 -0400 Subject: [NFS] [Linux-cluster] Re: [RFC] NLM lock failover admin interface In-Reply-To: <17554.19245.914383.436585@cse.unsw.edu.au> References: <1150089943.26019.18.camel@localhost.localdomain> <17550.11870.186706.36949@cse.unsw.edu.au> <1150268091.28264.75.camel@localhost.localdomain> <1150293654.28264.91.camel@localhost.localdomain> <20060615140743.36CDC1BBAD@citi.umich.edu> <17554.19245.914383.436585@cse.unsw.edu.au> Message-ID: <20060616153904.6A9A21BCBD@citi.umich.edu> > On Thursday June 15, andros at citi.umich.edu wrote: > > this discusion has centered around removing the locks of an export. > > we also want the interface to ge able to remove the locks owned by a single > > client. this is needed to enable client migration between replica's or between > > nodes in a cluster file system. it is not acceptable to place an entire export > > in grace just to move a small number of clients. > > Hmmmm.... > You want to remove all the locks owned by a particular client > with the intension of reclaiming those locks against a different NFS > server (on a cluster filesystem) > and you don't want to put the whole filesystem into grace mode while > doing it. > > Is that correct? yes. > > Sounds extremely racy to me. Suppose some other client takes a > conflicting lock between dropping them on one server and claiming them > on the other? That would be bad. The purpose of the grace mode is > precisely to avoid this sort of race. the idea is that the underlying file system can place only the files with locks held by the migrating client(s) into grace, leaving all other files for normal operation. the migrating (nfsv4) client then reclaims opens, locks and delegations on the new server. its just reducing the scope of the grace period. > > It would seem that what you "really" want to do is to tell the cluster > filesystem to migrate the locks to a different node and some how tell > lockd about out. what we really want is for the cluster file system to share the locks between the original node and the new node. then the client can simply be redirected and no grace period or reclaim is needed. this is much harder to code than a reduced grace period as describe above. from what we hear, lustre has this functionality. either way, the files with locks held by the migrating client need to be identified by both the lock manager (lockd/nfsv4 server) and the underlying fs. > > Is there a comprehensive design document about how this is going to > work, because I'm feeling doubtful. we have a work in progress - it's not done but may help describe our thinking. http://wiki.linux-nfs.org/index.php/Recovery_and_migration > > For the 'between replicas' case - I'm not sure locking makes sense. > Locking on a read-only filesystem is pretty pointless, and presumably > replicas are read-only??? nope. we have a promising prototye read/write replica scheme that we are testing. http://www.citi.umich.edu/techreports/reports/citi-tr-06-3.pdf i agree this is an outlying case.... but another immediate consumer of such an iterface would be an administator who needs to remove the locks for a client. -->Andy > > Basically, dropping locks that are expected to be picked up again, > without putting the whole filesystem into a grace period simply > doesn't sound workable to me. > > Am I missing something? > > NeilBrown > > > _______________________________________________ > NFS maillist - NFS at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs From sunjw at onewaveinc.com Fri Jun 16 14:38:58 2006 From: sunjw at onewaveinc.com (=?GB2312?B?y++/oc6w?=) Date: Fri, 16 Jun 2006 22:38:58 +0800 Subject: [Linux-cluster] gfs withdrawed in function xmote_bh with ret = 0x00000002 Message-ID: Hi,all I run the latest STABLE cluster code with 3 nodes, I get the message on one node after about 38 hours as: <-- Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: fatal: assertion "FALSE" failed Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: function = xmote_bh Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: file = /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/gfs/glock. c, line = 1093 Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: time = 1150408904 Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: about to withdraw from the cluster Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: waiting for outstanding I/O Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: telling LM to withdraw Jun 16 06:01:48 nd04 kernel: lock_dlm: withdraw abandoned memory Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: withdrawn Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: ret = 0x00000002 --> My test program has 'df', 'write', 'ls' and 'read'. and each node connect to RAID controller's host port directly with FC. What would be the problem? Thanks for any reply, Luckey From teigland at redhat.com Fri Jun 16 16:37:53 2006 From: teigland at redhat.com (David Teigland) Date: Fri, 16 Jun 2006 11:37:53 -0500 Subject: [Linux-cluster] Re: gfs withdrawed in function xmote_bh with ret = 0x00000002 In-Reply-To: References: Message-ID: <20060616163753.GB18872@redhat.com> On Fri, Jun 16, 2006 at 10:38:58PM +0800, ?????? wrote: > Hi,all > > I run the latest STABLE cluster code with 3 nodes, > I get the message on one node after about 38 hours as: > <-- > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: fatal: assertion "FALSE" failed > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: function = xmote_bh > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: file = /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/gfs/glock. > c, line = 1093 > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: time = 1150408904 > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: about to withdraw from the cluster > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: waiting for outstanding I/O > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: telling LM to withdraw > Jun 16 06:01:48 nd04 kernel: lock_dlm: withdraw abandoned memory > Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: withdrawn > Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: ret = 0x00000002 > --> > My test program has 'df', 'write', 'ls' and 'read'. > and each node connect to RAID controller's host port directly with FC. Hi, I've attached a small patch to print more information and call BUG instead of withdrawing. It may also be helpful to see a dlm lock dump and a gfs_tool lockdump on the machine after you hit the BUG. Thanks, Dave -------------- next part -------------- --- ./glock.c.orig 2006-06-16 11:17:48.313980418 -0500 +++ ./glock.c 2006-06-16 11:31:20.617855661 -0500 @@ -30,6 +30,9 @@ #include "quota.h" #include "recovery.h" +int dump_glock(struct gfs_glock *gl, char *buf, unsigned int size, + unsigned int *count) + /* Must be kept in sync with the beginning of struct gfs_glock */ struct glock_plug { struct list_head gl_list; @@ -1090,9 +1093,15 @@ spin_unlock(&gl->gl_spin); } else { - if (gfs_assert_withdraw(sdp, FALSE) == -1) - printk("GFS: fsid=%s: ret = 0x%.8X\n", - sdp->sd_fsname, ret); + char *buf; + int junk; + printk("GFS: fsid=%s: ret = 0x%.8X prev_state = %d\n", + sdp->sd_fsname, ret, prev_state); + buf = kmalloc(4096); + memset(buf, 0, sizeof(buf)); + dump_glock(gl, buf, 4096, &junk); + printk("%s\n", buf); + BUG(); } if (glops->go_xmote_bh) From aberoham at gmail.com Fri Jun 16 19:36:40 2006 From: aberoham at gmail.com (aberoham at gmail.com) Date: Fri, 16 Jun 2006 12:36:40 -0700 Subject: [Linux-cluster] recovering from "resource groups locked" error? Message-ID: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com> If clustat reports rgmanager as online, why would any clusvcadm operation fail with "Try again (resource groups locked)" ? Is there any way to recover from that rgmanger failure/error besides resetting the entire cluster? Details -- Yesterday evening a technician connected a Netgear GS748T switch to my network. The new switch somehow caused a storm of traffic that in turn caused a disruption of network connectivity across the entire LAN, including to all of my CS/GFS cluster nodes, for a few minutes until the new switch was removed from the network. This morning when I finally had a chance to investigate I found that all of the cluster members that are supposed to be online were online and that the cluster was quorate. But rgmanager would not work and services running under rgmanager were hung. (The cluster must have become inquorate and blocked access to the shared GFS volume while the outage was in progress. But some of the services and rgmanager never recovered?) I first tried resetting the "lead" member. (This is a pool of mirrored storage servers where the lead member creates a rsync batch off of a main fileserver and all of the other members then replay the rsync batch that is on a shared filesystem against their local filesystem mirror of the main fileserver) No matter what I did rgmanager would not start. cman_tool services would report code "S-1,80,4" -- root at gfs05:~ (0)>cman_tool services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [2 1 4 3] DLM Lock Space: "clvmd" 2 3 run - [2 1 4 3] User: "usrm::manager" 0 4 join S-1,80,4 [] Other cluster members would report rgmanager as online, yet when I tried to operate on member services, the operation would fail with "Try again (resource groups locked)". root at gfs06:~ (1)>clustat Member Status: Quorate Member Name Status ------ ---- ------ gfs04 Online, rgmanager gfs05 Online gfs06 Online, Local, rgmanager gfs07 Online, rgmanager gfs08 Offline Service Name Owner (Last) State ------- ---- ----- ------ ----- mapsmirror1 gfs05 started mapsmirror2 gfs06 started mapsmirror3 gfs07 started mapsmirror4 gfs04 started mapsmirror5 (none) stopped root at gfs06:~ (0)>clusvcadm -d mapsmirror1 Member gfs06 disabling mapsmirror1...failed: Try again (resource groups locked) Eventually I just gave up and power cycled all cluster members at ounce. Everything, including rgmanger, then came back online OK. -------------- next part -------------- An HTML attachment was scrubbed... URL: From magobin at gmail.com Fri Jun 16 19:38:39 2006 From: magobin at gmail.com (Alex) Date: Fri, 16 Jun 2006 21:38:39 +0200 Subject: [Linux-cluster] Postfix & Courier in cluster...little question!! Message-ID: <4493088f.3404778b.431a.fffff008@mx.gmail.com> Hi at all....I configured postfix and courier in a cluster, now I have the problem with Maildir directory...in fact if Postfix is on node1 and courier in node2....Courier can't access to maildir (obviously)....is it there anyone that can suggest me a good solution?? One solution may be drbd....but cluster suite doesn't implement anything for this issue??? Thanks in advance Alex From aberoham at gmail.com Fri Jun 16 19:39:59 2006 From: aberoham at gmail.com (aberoham at gmail.com) Date: Fri, 16 Jun 2006 12:39:59 -0700 Subject: [Linux-cluster] Re: recovering from "resource groups locked" error? In-Reply-To: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com> References: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com> Message-ID: <3bdb07840606161239i58fa11cfye5eb83b33da51236@mail.gmail.com> Btw, all members run on 2.6.9-34.ELsmp, cman-1.0.4-0 and cman-kernel-smp-2.6.9-43.8 with rgmanager-1.9.46-0. On 6/16/06, aberoham at gmail.com wrote: > > > If clustat reports rgmanager as online, why would any clusvcadm operation > fail with "Try again (resource groups locked)" ? > > Is there any way to recover from that rgmanger failure/error besides > resetting the entire cluster? > > Details -- > > Yesterday evening a technician connected a Netgear GS748T switch to my > network. The new switch somehow caused a storm of traffic that in turn > caused a disruption of network connectivity across the entire LAN, including > to all of my CS/GFS cluster nodes, for a few minutes until the new switch > was removed from the network. > > This morning when I finally had a chance to investigate I found that all > of the cluster members that are supposed to be online were online and that > the cluster was quorate. But rgmanager would not work and services running > under rgmanager were hung. (The cluster must have become inquorate and > blocked access to the shared GFS volume while the outage was in progress. > But some of the services and rgmanager never recovered?) > > I first tried resetting the "lead" member. (This is a pool of mirrored > storage servers where the lead member creates a rsync batch off of a main > fileserver and all of the other members then replay the rsync batch that is > on a shared filesystem against their local filesystem mirror of the main > fileserver) > > No matter what I did rgmanager would not start. cman_tool services would > report code "S-1,80,4" -- > > root at gfs05:~ > (0)>cman_tool services > Service Name GID LID State Code > Fence Domain: "default" 1 2 run - > [2 1 4 3] > > DLM Lock Space: "clvmd" 2 3 run - > [2 1 4 3] > > User: "usrm::manager" 0 4 join > S-1,80,4 > [] > > Other cluster members would report rgmanager as online, yet when I tried > to operate on member services, the operation would fail with "Try again > (resource groups locked)". > > root at gfs06:~ > (1)>clustat > Member Status: Quorate > > Member Name Status > ------ ---- ------ > gfs04 Online, rgmanager > gfs05 Online > gfs06 Online, Local, rgmanager > gfs07 Online, rgmanager > gfs08 Offline > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > mapsmirror1 gfs05 started > mapsmirror2 gfs06 started > mapsmirror3 gfs07 started > mapsmirror4 gfs04 started > mapsmirror5 (none) stopped > root at gfs06:~ > (0)>clusvcadm -d mapsmirror1 > Member gfs06 disabling mapsmirror1...failed: Try again (resource groups > locked) > > Eventually I just gave up and power cycled all cluster members at ounce. > Everything, including rgmanger, then came back online OK. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Fri Jun 16 20:14:49 2006 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 16 Jun 2006 16:14:49 -0400 Subject: [Linux-cluster] recovering from "resource groups locked" error? In-Reply-To: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com> References: <3bdb07840606161236n323d9f41if79abcb6530df747@mail.gmail.com> Message-ID: <1150488889.20766.323.camel@ayanami.boston.redhat.com> On Fri, 2006-06-16 at 12:36 -0700, aberoham at gmail.com wrote: > > If clustat reports rgmanager as online, why would any clusvcadm > operation fail with "Try again (resource groups locked)" ? > > Is there any way to recover from that rgmanger failure/error besides > resetting the entire cluster? Yeah, it's fixed in STABLE and RHEL4 branches at the moment. It was getting locked at the wrong time, and there used to be no way to unlock it. -- Lon From mathieu.avila at seanodes.com Mon Jun 19 11:48:56 2006 From: mathieu.avila at seanodes.com (Mathieu Avila) Date: Mon, 19 Jun 2006 13:48:56 +0200 Subject: [Linux-cluster] Compilation problem with GFS/GNBD and kernel panics on stress. Message-ID: <44968F28.8070505@seanodes.com> Hello all, (I've already posted this to cluster-devel at redhat.com,and it seems it wasn't the appropriate place as i didn't get any answer. Sorry for the cross-posting.) I have 2 problems: 1) I'm trying to use GFS with Fedora Core 4. It was upgraded to a kernel 2.6.16-1.2111_FC4smp. RPM versions are: GFS-kernel-smp-2.6.11.8-20050601.152643.FC4.25 GFS-6.1.0-3 GFS-kernheaders-2.6.11.8-20050601.152643.FC4.25 dlm-kernheaders-2.6.11.5-20050601.152643.FC4.22 dlm-kernel-smp-2.6.11.5-20050601.152643.FC4.22 dlm-1.0.0-3 gnbd-kernheaders-2.6.11.2-20050420.133124.FC4.58 gnbd-1.0.0-1 There was a problem to install the following packages,and the following patches were necessary: -GFS-kernel --- gfs-kernel-2.6.11.8-20050601.152643.FC4/src/gfs/ops_file.c.orig 2006-06-01 13:57:58.000000000 +0200 +++ gfs-kernel-2.6.11.8-20050601.152643.FC4/src/gfs/ops_file.c 2006-06-01 13:57:24.000000000 +0200 @@ -931,12 +931,12 @@ if (!access_ok(VERIFY_READ, buf, size)) return -EFAULT; - down(&inode->i_sem); + mutex_lock(&inode->i_mutex); if (file->f_flags & O_DIRECT) count = walk_vm(file, (char *)buf, size, offset, do_write_direct); else count = walk_vm(file, (char *)buf, size, offset, do_write_buf); - up(&inode->i_sem); + mutex_unlock(&inode->i_mutex); return count; } --- gfs-kernel-2.6.11.8-20050601.152643.FC4/src/gfs/ops_fstype.c.orig 2006-06-01 14:04:16.000000000 +0200 +++ gfs-kernel-2.6.11.8-20050601.152643.FC4/src/gfs/ops_fstype.c 2006-06-01 14:05:29.000000000 +0200 @@ -712,12 +712,12 @@ goto out; } else { char buf[BDEVNAME_SIZE]; - + unsigned long bsize; sb->s_flags = flags; strlcpy(sb->s_id, bdevname(real, buf), sizeof(sb->s_id)); - sb->s_old_blocksize = block_size(real); - sb_set_blocksize(sb, sb->s_old_blocksize); - set_blocksize(real, sb->s_old_blocksize); + bsize = block_size(real); + sb_set_blocksize(sb, bsize); + set_blocksize(real, bsize); error = fill_super(sb, data, (flags & MS_VERBOSE) ? 1 : 0); if (error) { up_write(&sb->s_umount); @@ -748,7 +748,7 @@ { struct block_device *diaper = sb->s_bdev; struct block_device *real = gfs_diaper_2real(diaper); - unsigned long bsize = sb->s_old_blocksize; + unsigned long bsize = block_size(real); generic_shutdown_super(sb); set_blocksize(diaper, bsize); I am quite confident about "file_ops.c" as it looks like the latest version for 2.6.15: http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/gfs-kernel/src/gfs/ops_file.c?rev=1.16.6.2.2.4&content-type=text/x-cvsweb-markup&cvsroot=cluster&only_with_tag=gfs-kernel_2_6_15_2 For "ops_fstype.c", it should be ok, unless you see obvious errors. - gnbd-kernel: --- gnbd-kernel-2.6.11.2-20050420.133124/src/gnbd.c.orig 2006-06-01 13:46:35.000000000 +0200 +++ gnbd-kernel-2.6.11.2-20050420.133124/src/gnbd.c 2006-06-01 13:47:03.000000000 +0200 @@ -180,9 +180,9 @@ set_capacity(dev->disk, size); bdev = bdget_disk(dev->disk, 0); if (bdev) { - down(&bdev->bd_inode->i_sem); + mutex_lock(&bdev->bd_inode->i_mutex); i_size_write(bdev->bd_inode, (loff_t)size << 9); - up(&bdev->bd_inode->i_sem); + mutex_unlock(&bdev->bd_inode->i_mutex); bdput(bdev); } up(&dev->do_it_lock); @@ -281,7 +281,7 @@ spin_lock_irqsave(q->queue_lock, flags); if (!end_that_request_first(req, uptodate, req->nr_sectors)) { - end_that_request_last(req); + end_that_request_last(req, 0); } spin_unlock_irqrestore(q->queue_lock, flags); } This one is quite straightforward. 2) Once compiled and run, i get 1 node running GNBD and exporting one of its disks. 3 other nodes are running as client for GNBD, and i mount a GFS on them, although all 4 nodes participate to a GFS cluster. (standard config : dlm, cman) I have tried to loop 100 times over parallel "bonnie++" on the 3 nodes, with: bonnie++ -u 0:0 -d /mnt/gfs -x 100 One of the nodes crashed before the end before the 10th loop, with the following panic: Unable to handle kernel paging request at 0000000000200220 RIP: ^M{:gfs:gfs_depend_add+430} ^MPGD 306d7067 PUD 37532067 PMD 0 ^MOops: 0000 [1] SMP ^Mlast sysfs file: /class/gnbd/gnbd0/waittime ^MCPU 1 ^MModules linked in: gnbd(U) lock_dlm(U) dlm(U) gfs(U) lock_harness(U) cman(U) ipv6 parport_pc lp parport autofs4 rfcomm l2cap bluetooth sunrpc pcmcia yent a_socket rsrc_nonstatic pcmcia_core dm_mod video button battery ac uhci_hcd ehci_hcd i2c_i801 i2c_core tg3 e1000 ext3 jbd ata_piix libata sd_mod scsi_mod ^MPid: 5679, comm: bonnie++ Tainted: GF 2.6.16-1.2111_FC4smp #1 ^MRIP: 0010:[] {:gfs:gfs_depend_add+430} ^MRSP: 0018:ffff81002bfddb38 EFLAGS: 00010206 ^MRAX: ffff810037571200 RBX: 0000000000003a98 RCX: 0000000000000002 ^MRDX: ffff810037571338 RSI: ffff81002bfddb08 RDI: ffff810001dd5c40 ^MRBP: ffffc2001017a000 R08: ffffc2001017c650 R09: 0000000000000040 ^MR10: 0000000000000040 R11: 0000000000040000 R12: 0000000000003a98 ^MR13: 00000001002ac770 R14: 00000000002001f0 R15: ffffc2001017a258 ^MFS: 00002aaaaaab8380(0000) GS:ffff8100021d9f40(0000) knlGS:0000000000000000 ^MCS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b ^MCR2: 0000000000200220 CR3: 0000000035b0b000 CR4: 00000000000006e0 ^MProcess bonnie++ (pid: 5679, threadinfo ffff81002bfdc000, task ffff81003ecd5860) ^MStack: ffff810037571200 000000018832af2b 0000000000d633e7 ffff810006d384a8 ^M ffff810022a0d978 0000000000d633e8 ffffc2001017a000 0000000000000001 ^M ffff810009bd4490 ffffffff8832b99b ^MCall Trace: {:gfs:gfs_wipe_buffers+842} ^M {:gfs:gfs_inode_dealloc+1023} {:gfs:gfs_unlinked_limit+230} ^M {:gfs:gfs_unlink+60} {:gfs:gfs_permission+483} ^M {permission+114} {vfs_unlink+203} ^M {do_unlinkat+184} {syscall_trace_enter+181} ^M {tracesys+113} {tracesys+209} ^MCode: 4d 8b 66 30 4c 89 ff e8 34 04 00 f8 8b 9d 94 02 00 00 4c 89 ^MRIP {:gfs:gfs_depend_add+430} RSP ^MCR2: 0000000000200220 ^M <0>Kernel panic - not syncing: Oops ^MCall Trace: {panic+133} {_spin_unlock_irqrestore+11} ^M {oops_end+71} {do_page_fault+1770} ^M {kmem_freepages+191} {slab_destroy+151} ^M {error_exit+0} {:gfs:gfs_depend_add+430} ^M {:gfs:gfs_depend_add+488} {:gfs:gfs_wipe_buffers+842} ^M {:gfs:gfs_inode_dealloc+1023} {:gfs:gfs_unlinked_limit+230} ^M {:gfs:gfs_unlink+60} {:gfs:gfs_permission+483} ^M {permission+114} {vfs_unlink+203} ^M {do_unlinkat+184} {syscall_trace_enter+181} ^M {tracesys+113} {tracesys+209} This is 100% reproducible. Any thoughts on this ? Maybe it has already been corrected in a more recent version ? -- Mathieu Avila From wcheng at redhat.com Mon Jun 19 13:59:56 2006 From: wcheng at redhat.com (Wendy Cheng) Date: Mon, 19 Jun 2006 09:59:56 -0400 Subject: [Linux-cluster] Compilation problem with GFS/GNBD and kernel panics on stress. In-Reply-To: <44968F28.8070505@seanodes.com> References: <44968F28.8070505@seanodes.com> Message-ID: <4496ADDC.8010206@redhat.com> Mathieu Avila wrote: > Hello all, > (I've already posted this to cluster-devel at redhat.com,and it seems it > wasn't the appropriate place as i didn't get any answer. Sorry for the > cross-posting.) You posted to the right list (cluster-devel) and we've checked into the issues over the weekend. CVS head should have the correct chagnes now: CVSROOT: /cvs/cluster Module name: cluster Changes by: wcheng sourceware org 2006-06-17 06:38:23 Modified files: gfs-kernel/src/gfs: ops_file.c ops_fstype.c Log message: Sync with base kernel data structure changes: 1. i_sem (in struct inode) is replaced by i_mutex. 2. s_old_blocksize (in struct super_block) no longer exists. Thank to Mathieu Avila pointed this out. -- Wendy From aberoham at gmail.com Mon Jun 19 18:30:38 2006 From: aberoham at gmail.com (aberoham at gmail.com) Date: Mon, 19 Jun 2006 11:30:38 -0700 Subject: [Linux-cluster] clucron.sh (Re: Centralized Cron) Message-ID: <3bdb07840606191130v200daacbob41087471c9e2ac4@mail.gmail.com> As a simple work-around solution to the desire posted in an earlier thread regarding how to best handle cluster-dependent cron jobs, I came up with the following script. The theory of operation is this: install the same cluster-depedent cronjobs on all members but prefice the cron command with clucron.sh [cluster service] [real cron cmd]. clucron.sh verifies the status of the cluster and punts if the service that the cron job is supposed to run against is not currently assigned and running on the particular cluster member. If the particular cluster member IS running the specified service, the cron job command is ran as usual. Note: "clustat -s [service]" functionality required for the attached script is missing in rgmanager-1.9.46 RPM. See https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185952 and download Mr. Hohberger's fixed RPMs before trying clucron.sh. Abe -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: clucron.sh Type: application/x-sh Size: 1195 bytes Desc: not available URL: From jason at monsterjam.org Mon Jun 19 20:59:22 2006 From: jason at monsterjam.org (Jason) Date: Mon, 19 Jun 2006 16:59:22 -0400 Subject: [Linux-cluster] servers crashing while not doing much. Message-ID: <20060619205922.GA10200@monsterjam.org> hey folks, I have 2 nodes running GFS 6.1.5 [root at tf1 ~]# rpm -qa | grep -i gfs GFS-6.1.5-0 GFS-kernheaders-2.6.9-49.1 GFS-kernel-smp-2.6.9-49.1 [root at tf1 ~]# rpm -qa | grep -i ccs ccs-devel-1.0.3-0 ccs-1.0.3-0 [root at tf1 ~]# [root at tf1 ~]# uname -a Linux tf1.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006 i686 i686 i386 GNU/Linux [root at tf1 ~]# and last week, we had them both go down on us unexpectedly. one had paniced and the other was powered off.. these systems are NOT in production yet, so there was some data on the GFS partition, but im pretty sure that there was not much activity when the boxes went down. Any help on what to do about this would be appreciated.. Here is the log from the one that panicd. Jun 10 03:59:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45030 seconds. Jun 10 03:59:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45060 seconds. Jun 10 04:00:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45090 seconds. Jun 10 04:00:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45120 seconds. Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session opened for user root by (uid=0) Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session closed for user root Jun 10 04:01:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45150 seconds. Jun 10 04:01:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45180 seconds. Jun 10 04:02:01 tf1 crond(pam_unix)[15620]: session opened for user root by (uid=0) Jun 10 04:02:03 tf1 kernel: des 1 Jun 10 04:02:03 tf1 kernel: clvmd total nodes 1 Jun 10 04:02:03 tf1 kernel: lv1 rebuild resource directory Jun 10 04:02:03 tf1 kernel: clvmd rebuild resource directory Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 resources Jun 10 04:02:03 tf1 kernel: clvmd purge requests Jun 10 04:02:03 tf1 kernel: clvmd purged 0 requests Jun 10 04:02:03 tf1 kernel: clvmd mark waiting requests Jun 10 04:02:03 tf1 kernel: clvmd marked 0 requests Jun 10 04:02:03 tf1 kernel: clvmd purge locks of departed nodes Jun 10 04:02:03 tf1 kernel: clvmd purged 0 locks Jun 10 04:02:03 tf1 kernel: clvmd update remastered resources Jun 10 04:02:03 tf1 kernel: clvmd updated 1 resources Jun 10 04:02:03 tf1 kernel: clvmd rebuild locks Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 locks Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 done Jun 10 04:02:03 tf1 kernel: clvmd move flags 0,0,1 ids 4,7,7 Jun 10 04:02:03 tf1 kernel: clvmd process held requests Jun 10 04:02:03 tf1 kernel: clvmd processed 0 requests Jun 10 04:02:03 tf1 kernel: clvmd resend marked requests Jun 10 04:02:03 tf1 kernel: clvmd resent 0 requests Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 finished Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 518 resources Jun 10 04:02:03 tf1 kernel: lv1 purge requests Jun 10 04:02:03 tf1 kernel: lv1 purged 0 requests Jun 10 04:02:03 tf1 kernel: lv1 mark waiting requests Jun 10 04:02:03 tf1 kernel: lv1 marked 0 requests Jun 10 04:02:03 tf1 kernel: lv1 purge locks of departed nodes Jun 10 04:02:03 tf1 kernel: lv1 purged 530 locks Jun 10 04:02:03 tf1 kernel: lv1 update remastered resources Jun 10 04:02:03 tf1 kernel: lv1 updated 20609 resources Jun 10 04:02:03 tf1 kernel: lv1 rebuild locks Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 0 locks Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 done Jun 10 04:02:03 tf1 kernel: lv1 move flags 0,0,1 ids 5,7,7 Jun 10 04:02:03 tf1 kernel: lv1 process held requests Jun 10 04:02:03 tf1 kernel: lv1 processed 0 requests Jun 10 04:02:03 tf1 kernel: lv1 resend marked requests Jun 10 04:02:03 tf1 kernel: lv1 resent 0 requests Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 finished Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 0 last_start 6 last_finish 0 Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 2 type 2 event 6 flags 250 Jun 10 04:02:03 tf1 kernel: 6851 claim_jid 1 Jun 10 04:02:03 tf1 kernel: 6851 pr_start 6 done 1 Jun 10 04:02:03 tf1 kernel: 6851 pr_finish flags 5a Jun 10 04:02:03 tf1 kernel: 6840 recovery_done jid 1 msg 309 a Jun 10 04:02:03 tf1 kernel: 6840 recovery_done nodeid 1 flg 18 Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 6 last_start 7 last_finish 6 Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 1 type 1 event 7 flags 21a Jun 10 04:02:03 tf1 kernel: 6851 pr_start cb jid 0 id 2 Jun 10 04:02:03 tf1 kernel: 6851 pr_start 7 done 0 Jun 10 04:02:03 tf1 kernel: 6854 recovery_done jid 0 msg 309 11a Jun 10 04:02:03 tf1 kernel: 6854 recovery_done nodeid 2 flg 1b Jun 10 04:02:03 tf1 kernel: 6854 recovery_done start_done 7 Jun 10 04:02:03 tf1 kernel: 6850 pr_finish flags 1a Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: lock_dlm: Assertion failed on line 428 of file /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c Jun 10 04:02:03 tf1 kernel: lock_dlm: assertion: "!error" Jun 10 04:02:03 tf1 kernel: lock_dlm: time = 1252230568 Jun 10 04:02:03 tf1 kernel: lv1: num=3,11 err=-22 cur=-1 req=3 lkf=8 Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: ------------[ cut here ]------------ Jun 10 04:02:03 tf1 kernel: kernel BUG at /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c:428! Jun 10 04:02:03 tf1 kernel: invalid operand: 0000 [#1] Jun 10 04:02:03 tf1 kernel: SMP Jun 10 04:02:03 tf1 kernel: Modules linked in: nls_utf8 vfat fat usb_storage lock_dlm(U) dcdipm(U) dcdbas(U) parport_pc lp parport autofs4 i2c_dev i2c_core gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc button battery ac uhci_hcd ehci_hcd hw_random shpchp eepro100 e100 mii e1000 floppy sg ext3 jbd dm_mod aic7xxx megaraid_mbox megaraid_mm sd_mod scsi _mod Jun 10 04:02:03 tf1 kernel: CPU: 3 Jun 10 04:02:03 tf1 kernel: EIP: 0060:[] Tainted: P VLI Jun 10 04:02:03 tf1 kernel: EFLAGS: 00010246 (2.6.9-34.ELsmp) Jun 10 04:02:03 tf1 kernel: EIP is at do_dlm_lock+0x134/0x14e [lock_dlm] Jun 10 04:02:03 tf1 kernel: eax: 00000001 ebx: ffffffea ecx: c585ace8 edx: f8bcc15f Jun 10 04:02:03 tf1 kernel: esi: f8bc7798 edi: f77c8400 ebp: c2361600 esp: c585ace4 Jun 10 04:02:03 tf1 kernel: ds: 007b es: 007b ss: 0068 Jun 10 04:02:03 tf1 kernel: Process df (pid: 15930, threadinfo=c585a000 task=d94fa6b0) Jun 10 04:02:03 tf1 kernel: Stack: f8bcc15f 20202020 33202020 20202020 20202020 20202020 31312020 00000018 Jun 10 04:02:03 tf1 kernel: d2956694 c2361600 00000003 00000000 c2361600 f8bc7828 00000003 f8bcf860 Jun 10 04:02:03 tf1 kernel: f8ba0000 f8bf45b2 00000000 00000001 f4fd2064 f4fd2048 f8ba0000 f8bea5cd Jun 10 04:02:03 tf1 kernel: Call Trace: Jun 10 04:02:03 tf1 kernel: [] lm_dlm_lock+0x49/0x52 [lock_dlm] Jun 10 04:02:03 tf1 kernel: [] gfs_lm_lock+0x35/0x4d [gfs] Jun 10 04:02:03 tf1 kernel: [] gfs_glock_xmote_th+0x130/0x172 [gfs] Jun 10 04:02:03 tf1 kernel: [] rq_promote+0xc8/0x147 [gfs] Jun 10 04:02:03 tf1 kernel: [] run_queue+0x91/0xc1 [gfs] Jun 10 04:02:03 tf1 kernel: [] gfs_glock_nq+0xcf/0x116 [gfs] Jun 10 04:02:03 tf1 kernel: [] gfs_glock_nq_init+0x13/0x26 [gfs] Jun 10 04:02:03 tf1 kernel: [] stat_gfs_async+0x119/0x187 [gfs] Jun 10 04:02:03 tf1 kernel: [] gfs_stat_gfs+0x27/0x4e [gfs] Jun 10 04:02:03 tf1 kernel: [] superblock_has_perm+0x1f/0x23 Jun 10 04:02:03 tf1 kernel: [] gfs_statfs+0x26/0xc7 [gfs] Jun 10 04:02:03 tf1 kernel: [] vfs_statfs+0x41/0x59 Jun 10 04:02:03 tf1 kernel: [] vfs_statfs64+0xe/0x28 Jun 10 04:02:03 tf1 kernel: [] __user_walk+0x4a/0x51 Jun 10 04:02:03 tf1 kernel: [] sys_statfs64+0x52/0xb2 Jun 10 04:02:03 tf1 kernel: [] do_mmap_pgoff+0x568/0x666 Jun 10 04:02:03 tf1 kernel: [] sys_mmap2+0x7e/0xaf Jun 10 04:02:03 tf1 kernel: [] do_page_fault+0x0/0x5c6 Jun 10 04:02:03 tf1 kernel: [] syscall_call+0x7/0xb Jun 10 04:02:03 tf1 kernel: Code: 26 50 0f bf 45 24 50 53 ff 75 08 ff 75 04 ff 75 0c ff 77 18 68 8a c2 bc f8 e8 ce ae 55 c7 83 c4 38 68 5f c1 bc f8 e8 c1 ae 55 c7 <0f> 0b ac 01 a7 c0 bc f8 68 61 c1 bc f8 e8 7c a6 55 c7 83 c4 20 Jun 10 04:02:03 tf1 kernel: <0>Fatal exception: panic in 5 seconds Jun 10 04:02:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45210 seconds. Jun 16 10:48:47 tf1 syslogd 1.4.1: restart. From carlopmart at gmail.com Tue Jun 20 08:24:34 2006 From: carlopmart at gmail.com (carlopmart) Date: Tue, 20 Jun 2006 10:24:34 +0200 Subject: [Linux-cluster] Problems with ccsd (SOLVED) In-Reply-To: <448EC753.1010505@dorm.org> References: <590a9c800606130123s18bc3a7ic42fe7dd85ad3cc8@mail.gmail.com> <448EC753.1010505@dorm.org> Message-ID: <4497B0C2.9020309@gmail.com> Sorry for my later response ... That is the solution ... Many thanks Brenton. Brenton Rothchild wrote: > IIRC, I've seen this when the magma-plugins RPM wasn't installed, > if you're using RPMS that is :) > > -Brenton Rothchild > > > C. L. Martinez wrote: >> Hi all, >> >> I have setup two rhel4 U3 boxes with rhcs 4. When ccsd process tries >> to start returns me this error: >> >> [root at srvimss1 init.d]# ccsd >> Failed to connect to cluster manager. >> Hint: Magma plugins are not in the right spot. >> >> How can I fix this?? Where is the problem?? >> >> My cluster.conf: >> >> >> >> >> >> >> >> >> > nodename="srvimss1"/> >> >> >> >> >> >> >> > nodename="srvimss2"/> >> >> >> >> >> >> >> > servers="srvmgmt"/> >> >> >> >> > restricted="1"> >> > priority="1"/> >> > priority="2"/> >> >> > restricted="1"> >> > priority="2"/> >> > priority="1"/> >> >> >> >> >> >> >> -- >> C.L. Martinez >> clopmart at gmail.com >> >> >> ------------------------------------------------------------------------ >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- CL Martinez carlopmart {at} gmail {d0t} com From mathieu.avila at seanodes.com Tue Jun 20 09:53:53 2006 From: mathieu.avila at seanodes.com (Mathieu Avila) Date: Tue, 20 Jun 2006 11:53:53 +0200 Subject: [Linux-cluster] Compilation problem with GFS/GNBD and kernel panics on stress. In-Reply-To: <4496ADDC.8010206@redhat.com> References: <44968F28.8070505@seanodes.com> <4496ADDC.8010206@redhat.com> Message-ID: <4497C5B1.3090501@seanodes.com> Wendy Cheng wrote: > Mathieu Avila wrote: > >> Hello all, >> (I've already posted this to cluster-devel at redhat.com,and it seems it >> wasn't the appropriate place as i didn't get any answer. Sorry for >> the cross-posting.) > > > You posted to the right list (cluster-devel) and we've checked into > the issues over the weekend. CVS head should have the correct chagnes > now: > > > -- Wendy Thank you Wendy, Do you have any idea on the other problem (crash of GNBD+GFS under heavy stress) ? Are there any known problems with the versions I use ? Do you need additional information to deal with this issue ? -- Mathieu From djkast at gmail.com Tue Jun 20 18:45:58 2006 From: djkast at gmail.com (DJ-Kast .) Date: Tue, 20 Jun 2006 14:45:58 -0400 Subject: [Linux-cluster] GFS or ??? Message-ID: Hi, I am looking for advice on a configuration for a portal I am setting up. I will have 3 Load Balanced BSD web servers that will be using a SAN for storage I will have 2 clustered Redhat boxes, 1 active and 1 passive connected to the SAN. The SAN will be connected via iSCSI to the 2 Redhat boxes Do I need to use GFS to mount the drives? I am skeptical about doing the GFS->NFS export, as I've seen lots of posts of people having problems. Can this extra step be eliminated by something more efficient for the setup I require? Thanks in advance -Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at monsterjam.org Tue Jun 20 22:11:05 2006 From: jason at monsterjam.org (Jason) Date: Tue, 20 Jun 2006 18:11:05 -0400 Subject: [Linux-cluster] [2nd try: servers crashing while not doing much.] Message-ID: <20060620221104.GA20673@monsterjam.org> hey folks, I have 2 nodes running GFS 6.1.5 [root at tf1 ~]# rpm -qa | grep -i gfs GFS-6.1.5-0 GFS-kernheaders-2.6.9-49.1 GFS-kernel-smp-2.6.9-49.1 [root at tf1 ~]# rpm -qa | grep -i ccs ccs-devel-1.0.3-0 ccs-1.0.3-0 [root at tf1 ~]# [root at tf1 ~]# uname -a Linux tf1.localdomain 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006 i686 i686 i386 GNU/Linux [root at tf1 ~]# and last week, we had them both go down on us unexpectedly. one had paniced and the other was powered off.. these systems are NOT in production yet, so there was some data on the GFS partition, but im pretty sure that there was not much activity when the boxes went down. Any help on what to do about this would be appreciated.. Here is the log from the one that panicd. Jun 10 03:59:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45030 seconds. Jun 10 03:59:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45060 seconds. Jun 10 04:00:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45090 seconds. Jun 10 04:00:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45120 seconds. Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session opened for user root by (uid=0) Jun 10 04:01:01 tf1 crond(pam_unix)[15618]: session closed for user root Jun 10 04:01:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45150 seconds. Jun 10 04:01:37 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45180 seconds. Jun 10 04:02:01 tf1 crond(pam_unix)[15620]: session opened for user root by (uid=0) Jun 10 04:02:03 tf1 kernel: des 1 Jun 10 04:02:03 tf1 kernel: clvmd total nodes 1 Jun 10 04:02:03 tf1 kernel: lv1 rebuild resource directory Jun 10 04:02:03 tf1 kernel: clvmd rebuild resource directory Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 resources Jun 10 04:02:03 tf1 kernel: clvmd purge requests Jun 10 04:02:03 tf1 kernel: clvmd purged 0 requests Jun 10 04:02:03 tf1 kernel: clvmd mark waiting requests Jun 10 04:02:03 tf1 kernel: clvmd marked 0 requests Jun 10 04:02:03 tf1 kernel: clvmd purge locks of departed nodes Jun 10 04:02:03 tf1 kernel: clvmd purged 0 locks Jun 10 04:02:03 tf1 kernel: clvmd update remastered resources Jun 10 04:02:03 tf1 kernel: clvmd updated 1 resources Jun 10 04:02:03 tf1 kernel: clvmd rebuild locks Jun 10 04:02:03 tf1 kernel: clvmd rebuilt 0 locks Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 done Jun 10 04:02:03 tf1 kernel: clvmd move flags 0,0,1 ids 4,7,7 Jun 10 04:02:03 tf1 kernel: clvmd process held requests Jun 10 04:02:03 tf1 kernel: clvmd processed 0 requests Jun 10 04:02:03 tf1 kernel: clvmd resend marked requests Jun 10 04:02:03 tf1 kernel: clvmd resent 0 requests Jun 10 04:02:03 tf1 kernel: clvmd recover event 7 finished Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 518 resources Jun 10 04:02:03 tf1 kernel: lv1 purge requests Jun 10 04:02:03 tf1 kernel: lv1 purged 0 requests Jun 10 04:02:03 tf1 kernel: lv1 mark waiting requests Jun 10 04:02:03 tf1 kernel: lv1 marked 0 requests Jun 10 04:02:03 tf1 kernel: lv1 purge locks of departed nodes Jun 10 04:02:03 tf1 kernel: lv1 purged 530 locks Jun 10 04:02:03 tf1 kernel: lv1 update remastered resources Jun 10 04:02:03 tf1 kernel: lv1 updated 20609 resources Jun 10 04:02:03 tf1 kernel: lv1 rebuild locks Jun 10 04:02:03 tf1 kernel: lv1 rebuilt 0 locks Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 done Jun 10 04:02:03 tf1 kernel: lv1 move flags 0,0,1 ids 5,7,7 Jun 10 04:02:03 tf1 kernel: lv1 process held requests Jun 10 04:02:03 tf1 kernel: lv1 processed 0 requests Jun 10 04:02:03 tf1 kernel: lv1 resend marked requests Jun 10 04:02:03 tf1 kernel: lv1 resent 0 requests Jun 10 04:02:03 tf1 kernel: lv1 recover event 7 finished Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 0 last_start 6 last_finish 0 Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 2 type 2 event 6 flags 250 Jun 10 04:02:03 tf1 kernel: 6851 claim_jid 1 Jun 10 04:02:03 tf1 kernel: 6851 pr_start 6 done 1 Jun 10 04:02:03 tf1 kernel: 6851 pr_finish flags 5a Jun 10 04:02:03 tf1 kernel: 6840 recovery_done jid 1 msg 309 a Jun 10 04:02:03 tf1 kernel: 6840 recovery_done nodeid 1 flg 18 Jun 10 04:02:03 tf1 kernel: 6851 pr_start last_stop 6 last_start 7 last_finish 6 Jun 10 04:02:03 tf1 kernel: 6851 pr_start count 1 type 1 event 7 flags 21a Jun 10 04:02:03 tf1 kernel: 6851 pr_start cb jid 0 id 2 Jun 10 04:02:03 tf1 kernel: 6851 pr_start 7 done 0 Jun 10 04:02:03 tf1 kernel: 6854 recovery_done jid 0 msg 309 11a Jun 10 04:02:03 tf1 kernel: 6854 recovery_done nodeid 2 flg 1b Jun 10 04:02:03 tf1 kernel: 6854 recovery_done start_done 7 Jun 10 04:02:03 tf1 kernel: 6850 pr_finish flags 1a Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: lock_dlm: Assertion failed on line 428 of file /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c Jun 10 04:02:03 tf1 kernel: lock_dlm: assertion: "!error" Jun 10 04:02:03 tf1 kernel: lock_dlm: time = 1252230568 Jun 10 04:02:03 tf1 kernel: lv1: num=3,11 err=-22 cur=-1 req=3 lkf=8 Jun 10 04:02:03 tf1 kernel: Jun 10 04:02:03 tf1 kernel: ------------[ cut here ]------------ Jun 10 04:02:03 tf1 kernel: kernel BUG at /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c:428! Jun 10 04:02:03 tf1 kernel: invalid operand: 0000 [#1] Jun 10 04:02:03 tf1 kernel: SMP Jun 10 04:02:03 tf1 kernel: Modules linked in: nls_utf8 vfat fat usb_storage lock_dlm(U) dcdipm(U) dcdbas(U) parport_pc lp parport autofs4 i2c_dev i2c_core gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc button battery ac uhci_hcd ehci_hcd hw_random shpchp eepro100 e100 mii e1000 floppy sg ext3 jbd dm_mod aic7xxx megaraid_mbox megaraid_mm sd_mod scsi _mod Jun 10 04:02:03 tf1 kernel: CPU: 3 Jun 10 04:02:03 tf1 kernel: EIP: 0060:[] Tainted: P VLI Jun 10 04:02:03 tf1 kernel: EFLAGS: 00010246 (2.6.9-34.ELsmp) Jun 10 04:02:03 tf1 kernel: EIP is at do_dlm_lock+0x134/0x14e [lock_dlm] Jun 10 04:02:03 tf1 kernel: eax: 00000001 ebx: ffffffea ecx: c585ace8 edx: f8bcc15f Jun 10 04:02:03 tf1 kernel: esi: f8bc7798 edi: f77c8400 ebp: c2361600 esp: c585ace4 Jun 10 04:02:03 tf1 kernel: ds: 007b es: 007b ss: 0068 Jun 10 04:02:03 tf1 kernel: Process df (pid: 15930, threadinfo=c585a000 task=d94fa6b0) Jun 10 04:02:03 tf1 kernel: Stack: f8bcc15f 20202020 33202020 20202020 20202020 20202020 31312020 00000018 Jun 10 04:02:03 tf1 kernel: d2956694 c2361600 00000003 00000000 c2361600 f8bc7828 00000003 f8bcf860 Jun 10 04:02:03 tf1 kernel: f8ba0000 f8bf45b2 00000000 00000001 f4fd2064 f4fd2048 f8ba0000 f8bea5cd Jun 10 04:02:03 tf1 kernel: Call Trace: Jun 10 04:02:03 tf1 kernel: [] lm_dlm_lock+0x49/0x52 [lock_dlm] Jun 10 04:02:03 tf1 kernel: [] gfs_lm_lock+0x35/0x4d [gfs] Jun 10 04:02:03 tf1 kernel: [] gfs_glock_xmote_th+0x130/0x172 [gfs] Jun 10 04:02:03 tf1 kernel: [] rq_promote+0xc8/0x147 [gfs] Jun 10 04:02:03 tf1 kernel: [] run_queue+0x91/0xc1 [gfs] Jun 10 04:02:03 tf1 kernel: [] gfs_glock_nq+0xcf/0x116 [gfs] Jun 10 04:02:03 tf1 kernel: [] gfs_glock_nq_init+0x13/0x26 [gfs] Jun 10 04:02:03 tf1 kernel: [] stat_gfs_async+0x119/0x187 [gfs] Jun 10 04:02:03 tf1 kernel: [] gfs_stat_gfs+0x27/0x4e [gfs] Jun 10 04:02:03 tf1 kernel: [] superblock_has_perm+0x1f/0x23 Jun 10 04:02:03 tf1 kernel: [] gfs_statfs+0x26/0xc7 [gfs] Jun 10 04:02:03 tf1 kernel: [] vfs_statfs+0x41/0x59 Jun 10 04:02:03 tf1 kernel: [] vfs_statfs64+0xe/0x28 Jun 10 04:02:03 tf1 kernel: [] __user_walk+0x4a/0x51 Jun 10 04:02:03 tf1 kernel: [] sys_statfs64+0x52/0xb2 Jun 10 04:02:03 tf1 kernel: [] do_mmap_pgoff+0x568/0x666 Jun 10 04:02:03 tf1 kernel: [] sys_mmap2+0x7e/0xaf Jun 10 04:02:03 tf1 kernel: [] do_page_fault+0x0/0x5c6 Jun 10 04:02:03 tf1 kernel: [] syscall_call+0x7/0xb Jun 10 04:02:03 tf1 kernel: Code: 26 50 0f bf 45 24 50 53 ff 75 08 ff 75 04 ff 75 0c ff 77 18 68 8a c2 bc f8 e8 ce ae 55 c7 83 c4 38 68 5f c1 bc f8 e8 c1 ae 55 c7 <0f> 0b ac 01 a7 c0 bc f8 68 61 c1 bc f8 e8 7c a6 55 c7 83 c4 20 Jun 10 04:02:03 tf1 kernel: <0>Fatal exception: panic in 5 seconds Jun 10 04:02:07 tf1 ccsd[3939]: Unable to connect to cluster infrastructure after 45210 seconds. Jun 16 10:48:47 tf1 syslogd 1.4.1: restart. ----- End forwarded message ----- -- ================================================ | Jason Welsh jason at monsterjam.org | | http://monsterjam.org DSS PGP: 0x5E30CC98 | | gpg key: http://monsterjam.org/gpg/ | ================================================ From jbrassow at redhat.com Wed Jun 21 02:19:21 2006 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Tue, 20 Jun 2006 21:19:21 -0500 Subject: [Linux-cluster] GFS or ??? In-Reply-To: References: Message-ID: <37358bd202580513a95f2da80e092dfa@redhat.com> GFS is primarily used in active/active setups. You may be able to get by with rgmanager if you are using active/passive, but I'll let someone who knows more talk about that. brassow On Jun 20, 2006, at 1:45 PM, DJ-Kast . wrote: > Hi, > > ? I am looking for advice on a configuration for a portal I am setting > up. > > I will have 3 Load Balanced BSD web servers that will be using a SAN > for storage > > I will have 2 clustered Redhat boxes, 1 active and 1 passive connected > to the SAN. > The SAN will be connected via iSCSI to the 2 Redhat boxes > > Do I need to use GFS to mount the drives? > > I am skeptical about doing the GFS->NFS export, as I've seen lots of > posts of people > having problems.? Can this extra step be eliminated by something more > efficient for > the setup I require? > > Thanks in advance > > -Paul > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From Sherman.Chan at world.net Wed Jun 21 08:55:07 2006 From: Sherman.Chan at world.net (Sherman Chan) Date: Wed, 21 Jun 2006 16:55:07 +0800 Subject: [Linux-cluster] RE : GFS Sharing Hard Disk Message-ID: <95DE5EAA51B5014CB68664BE5192A89002723607@exchange.world.net> Hi, I would like to know is that possible to has a hard disk physically shared by multi servers. I have a SAN which has a logical disk setup that can be mount/accessed by multi servers at the same time directly, however lacking off global locking/synchronization system the data can not be shared properly, data I update from server 1, could not been seen on server 2, unless I dismount and remount the disk on server 2. I know thing can not be that easy. I do not want to lose performance by using NFS or iSCSI. I have look at GFS, it seems that is a right tools to me but I do not want to setup a full cluster environment. Does it has any way to use GFS without a complete cluster setup? Thanks Sherman -------------- next part -------------- An HTML attachment was scrubbed... URL: From adingman at cookgroup.com Wed Jun 21 12:24:42 2006 From: adingman at cookgroup.com (Andrew C. Dingman) Date: Wed, 21 Jun 2006 08:24:42 -0400 Subject: [Linux-cluster] RE : GFS Sharing Hard Disk In-Reply-To: <95DE5EAA51B5014CB68664BE5192A89002723607@exchange.world.net> References: <95DE5EAA51B5014CB68664BE5192A89002723607@exchange.world.net> Message-ID: <1150892682.13144.6.camel@ampelos.cin.cook> The short answer is that you need a "complete cluster setup" to make any simultaneous-access shared-storage solution work. You need cluster membership, locking, and fencing for GFS. Depending on your needs, you might be able to skip much of the rest of cluster suite's feature set, but you really do need that membership and locking infrastructure. It makes safe concurrent access to the same disks possible. On Wed, 2006-06-21 at 16:55 +0800, Sherman Chan wrote: > Hi, > I would like to know is that possible to has a hard disk physically > shared by multi servers. I have a SAN which has a logical disk setup > that can be mount/accessed by multi servers at the same time directly, > however lacking off global locking/synchronization system the data can > not be shared properly, data I update from server 1, could not been > seen on server 2, unless I dismount and remount the disk on server > 2. I know thing can not be that easy. > > I do not want to lose performance by using NFS or iSCSI. I have look > at GFS, it seems that is a right tools to me but I do not want to > setup a full cluster environment. Does it has any way to use GFS > without a complete cluster setup? > > > > Thanks > Sherman > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Andrew C. Dingman Unix Administrator Cook (812)339-2235 x2131 adingman at cookgroup.com From ranjtech at gmail.com Wed Jun 21 17:35:05 2006 From: ranjtech at gmail.com (RR) Date: Thu, 22 Jun 2006 03:35:05 +1000 Subject: [Linux-cluster] partitioning of filesystems in cluster nodes Message-ID: <001a01c69559$0a7cd920$1f768b60$@com> Hello all, Is there a particular manner I should partition the local filesystems of each of the cluster nodes to support the Cluster Suite w/GFS or it doesn't matter? My specific requirement is that I may or may not be able to change the location where this specific application writes data. And I need that directory/filesystem that this data is written to, e.g. /var/spool to be accessible on my iSCSI SAN by all the cluster nodes. The answer could be very simple but want to double check. Rgds, RR -------------- next part -------------- An HTML attachment was scrubbed... URL: From teigland at redhat.com Wed Jun 21 17:47:59 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 21 Jun 2006 12:47:59 -0500 Subject: [Linux-cluster] [2nd try: servers crashing while not doing much.] In-Reply-To: <20060620221104.GA20673@monsterjam.org> References: <20060620221104.GA20673@monsterjam.org> Message-ID: <20060621174759.GA4706@redhat.com> On Tue, Jun 20, 2006 at 06:11:05PM -0400, Jason wrote: > Jun 10 04:02:03 tf1 kernel: lock_dlm: Assertion failed on line 428 of file > /usr/src/redhat/BUILD/gfs-kernel-2.6.9-49/smp/src/dlm/lock.c > Jun 10 04:02:03 tf1 kernel: lock_dlm: assertion: "!error" > Jun 10 04:02:03 tf1 kernel: lock_dlm: time = 1252230568 > Jun 10 04:02:03 tf1 kernel: lv1: num=3,11 err=-22 cur=-1 req=3 lkf=8 Unfortunately this assertion doesn't tell us much since any number of problems can lead to this. We usually depend on previous debug messages to figure out what happened, but there wasn't anything unusual in the logs you posted. I'm going to be adding some extra debugging to help narrow this down, but that's not slated until RHEL4U5. Dave From teigland at redhat.com Wed Jun 21 17:54:30 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 21 Jun 2006 12:54:30 -0500 Subject: [Linux-cluster] GFS locking issues In-Reply-To: <433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com> References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com> <20060615190959.GB1913@redhat.com> <433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com> Message-ID: <20060621175430.GB4706@redhat.com> On Fri, Jun 16, 2006 at 06:37:14PM +0300, Anton Kornev wrote: > gnbd (pid 5836: alogc.pl) got signal 9 > gnbd0: Send control failed (result -4) > gnbd (pid 5836: alogc.pl) got signal 15 > gnbd0: Send control failed (result -4) This and the fact that a number of processes appear to be blocked in the i/o path seem to point at gnbd as the hold-up. Dave > 51 D wait_on_buffer pdflush > 5771 D lock_page lock_dlm1 > 5776 D - gfs_logd > 5777 D - gfs_quotad > 5778 D - gfs_inoded > 5892 D - httpd > 5895 D glock_wait_internal httpd > 5896 D glock_wait_internal httpd > 5897 D glock_wait_internal httpd > 5911 D glock_wait_internal httpd > 5915 D wait_on_buffer httpd > 5930 D wait_on_buffer sh > pdflush D ffffffff8014aabc 0 51 6 53 50 > (L-TLB) > 00000100dfc3dc78 0000000000000046 000001011bd3e980 000001010fc11f00 > 0000000000000216 ffffffffa0042916 000001011aca60c0 0000000000000008 > 000001011fdef7f0 0000000000000dfa > Call Trace:{:dm_mod:dm_request+396} > {keventd_create_kthread+0} > {io_schedule+38} > {__wait_on_buffer+125} > {bh_wake_function+0} > {bh_wake_function+0} > {:gfs:gfs_logbh_wait+49} > {:gfs:disk_commit+794} > {:gfs:log_refund+111} > {:gfs:log_flush_internal+510} > {sync_supers+167} {wb_kupdate+36} > > {pdflush+323} {wb_kupdate+0} > {pdflush+0} {kthread+200} > {child_rip+8} > {keventd_create_kthread+0} > {kthread+0} {child_rip+0} > lock_dlm1 D 000001000c0096e0 0 5771 6 5772 5766 > (L-TLB) > 0000010113ce3c58 0000000000000046 0000001000000000 0000010000000069 > 000001011420b030 0000000000000069 000001000c00a940 000000010000eb10 > 000001011a887030 0000000000001cae > Call Trace:{__generic_unplug_device+19} > {io_schedule+38} > {__lock_page+191} > {page_wake_function+0} > {page_wake_function+0} > {truncate_inode_pages+519} > {:gfs:gfs_inval_page+63} > {:gfs:drop_bh+233} > {:gfs:gfs_glock_cb+194} > {:lock_dlm:dlm_async+1989} > {default_wake_function+0} > {keventd_create_kthread+0} > {:lock_dlm:dlm_async+0} > {keventd_create_kthread+0} > {kthread+200} {child_rip+8} > {keventd_create_kthread+0} > {kthread+0} > {child_rip+0} > gfs_logd D 0000000000000000 0 5776 1 5777 5775 > (L-TLB) > 000001011387fe38 0000000000000046 0000000000000000 ffffffff80304a85 > 000001011387fe58 ffffffff80304add ffffffff803cca80 0000000000000246 > 00000101143fe030 00000000000000b5 > Call Trace:{thread_return+0} > {thread_return+88} > {:gfs:lock_on_glock+112} > {__down_write+134} > {:gfs:gfs_ail_empty+56} > {:gfs:gfs_logd+77} > {child_rip+8} > {dummy_d_instantiate+0} > {:gfs:gfs_logd+0} {child_rip+0} > > gfs_quotad D 0000000000000000 0 5777 1 5778 5776 > (L-TLB) > 0000010113881e98 0000000000000046 0000000000000000 ffffffff80304a85 > 0000010113881eb8 ffffffff80304add 000001011ff87030 0000000100000074 > 000001011430f7f0 0000000000000128 > Call Trace:{thread_return+0} > {thread_return+88} > {__down_write+134} > {:gfs:gfs_quota_sync+226} > {:gfs:gfs_quotad+127} > {child_rip+8} > {dummy_d_instantiate+0} > {dummy_d_instantiate+0} > {dummy_d_instantiate+0} > {:gfs:gfs_quotad+0} > {child_rip+0} > gfs_inoded D 0000000000000000 0 5778 1 5807 5777 > (L-TLB) > 0000010113883e98 0000000000000046 000001011e2937f0 000001000c0096e0 > 0000000000000000 ffffffff80304a85 0000010113883ec8 0000000180304add > 000001011e2937f0 00000000000000c2 > Call Trace:{thread_return+0} > {__down_write+134} > {:gfs:unlinked_find+115} > {:gfs:gfs_unlinked_dealloc+25} > {:gfs:gfs_inoded+66} > {child_rip+8} > {:gfs:gfs_inoded+0} {child_rip+0} > > > httpd D ffffffff80304190 0 5892 1 5893 5826 > (NOTLB) > 0000010111b75bf8 0000000000000002 0000000000000001 0000000000000001 > 0000000000000000 0000000000000000 0000010114667980 0000000111b75bc0 > 00000101143fe7f0 00000000000009ad > Call Trace:{__down+147} > {default_wake_function+0} > {generic_file_write_nolock+158} > {__down_failed+53} > {:gfs:.text.lock.dio+95} > {:gfs:gfs_trans_add_bh+205} > {:gfs:do_write_buf+1138} > {:gfs:walk_vm+278} > {:gfs:do_write_buf+0} > {:gfs:do_write_buf+0} > {:gfs:__gfs_write+201} > {vfs_write+207} > {sys_write+69} {system_call+126} > > httpd D 0000010110ad7d48 0 5895 5892 5896 5893 > (NOTLB) > 0000010110ad7bd8 0000000000000006 000001011b16e030 0000000000000075 > 0000010117002030 0000000000000075 000001000c002940 0000000000000001 > 00000101170027f0 000000000001300e > Call Trace:{try_to_wake_up+863} > {wait_for_completion+167} > {default_wake_function+0} > {default_wake_function+0} > {:gfs:glock_wait_internal+350} > {:gfs:gfs_glock_nq+961} > {:gfs:gfs_glock_nq_init+20} > {:gfs:gfs_private_nopage+84} > {do_no_page+1003} > {do_wp_page+948} > {handle_mm_fault+343} > {get_signal_to_deliver+1118} > {do_page_fault+518} > {thread_return+0} > {thread_return+88} {error_exit+0} > > > httpd D 0000010110b5bd48 0 5896 5892 5897 5895 > (NOTLB) > 0000010110b5bbd8 0000000000000002 00000101170027f0 0000000000000075 > 00000101114787f0 0000000000000075 000001000c002940 0000000000000001 > 0000010117002030 000000000000fb3e > Call Trace:{try_to_wake_up+863} > {wait_for_completion+167} > {default_wake_function+0} > {default_wake_function+0} > {:gfs:glock_wait_internal+350} > {:gfs:gfs_glock_nq+961} > {:gfs:gfs_glock_nq_init+20} > {:gfs:gfs_private_nopage+84} > {do_no_page+1003} > {do_wp_page+948} > {handle_mm_fault+343} > {get_signal_to_deliver+1118} > {do_page_fault+518} > {sys_accept+327} > {pipe_read+26} {error_exit+0} > > httpd D 0000000000000000 0 5897 5892 5911 5896 > (NOTLB) > 0000010110119bd8 0000000000000006 0000010117002030 0000000000000075 > 0000010117002030 0000000000000075 000001000c00a940 000000001b16e030 > 00000101114787f0 000000000000fbe0 > Call Trace:{__generic_unplug_device+19} > {wait_for_completion+167} > {default_wake_function+0} > {default_wake_function+0} > {:gfs:glock_wait_internal+350} > {:gfs:gfs_glock_nq+961} > {:gfs:gfs_glock_nq_init+20} > {:gfs:gfs_private_nopage+84} > {do_no_page+1003} > {do_wp_page+948} > {handle_mm_fault+343} > {get_signal_to_deliver+1118} > {do_page_fault+518} > {thread_return+0} > {thread_return+88} {error_exit+0} > > > httpd D 00000101100c3d48 0 5911 5892 5915 5897 > (NOTLB) > 00000101100c3bd8 0000000000000002 000001011420b7f0 0000000000000075 > 00000101170027f0 0000000000000075 000001000c002940 0000000000000000 > 000001011b16e030 000000000000187e > Call Trace:{try_to_wake_up+863} > {wait_for_completion+167} > {default_wake_function+0} > {default_wake_function+0} > {:gfs:glock_wait_internal+350} > {:gfs:gfs_glock_nq+961} > {:gfs:gfs_glock_nq_init+20} > {:gfs:gfs_private_nopage+84} > {do_no_page+1003} > {do_wp_page+948} > {handle_mm_fault+343} > {get_signal_to_deliver+1118} > {do_page_fault+518} > {thread_return+0} > {thread_return+88} {error_exit+0} > > > httpd D 0000000000006a36 0 5915 5892 5911 > (NOTLB) > 00000101180f7ad8 0000000000000006 0000000000002706 ffffffffa020c791 > 0000000000000000 0000000000000000 0000030348ac8c1c 0000000114a217f0 > 0000010114c997f0 000000000000076a > Call Trace:{:dlm:lkb_swqueue+43} > {io_schedule+38} > {__wait_on_buffer+125} > {bh_wake_function+0} > {bh_wake_function+0} > {:gfs:gfs_dreread+154} > {:gfs:gfs_dread+40} > {:gfs:gfs_get_meta_buffer+201} > {:gfs:gfs_copyin_dinode+23} > {:gfs:inode_go_lock+38} > {:gfs:glock_wait_internal+563} > {:gfs:gfs_glock_nq+961} > {:gfs:gfs_glock_nq_init+20} > {:gfs:gfs_private_nopage+84} > {do_no_page+1003} > {do_wp_page+948} > {handle_mm_fault+343} > {get_signal_to_deliver+1118} > {do_page_fault+518} > {thread_return+0} > {thread_return+88} {error_exit+0} > > > sh D 000000000000001a 0 5930 2547 > (NOTLB) > 000001011090f8e8 0000000000000002 0000010111293d88 0000010110973d00 > 0000010111293d88 0000000000000000 00000100dfc02400 0000000000010000 > 00000101148557f0 0000000000002010 > Call Trace:{io_schedule+38} > {__wait_on_buffer+125} > {bh_wake_function+0} > {bh_wake_function+0} > {:gfs:gfs_dreread+154} > {:gfs:gfs_dread+40} > {:gfs:gfs_get_meta_buffer+201} > {:gfs:gfs_copyin_dinode+23} > {:gfs:inode_go_lock+38} > {:gfs:glock_wait_internal+563} > {:gfs:gfs_glock_nq+961} > {:gfs:gfs_glock_nq_init+20} > {dummy_inode_permission+0} > {:gfs:gfs_permission+64} > {dput+56} {permission+51} > {__link_path_walk+372} > {link_path_walk+82} > {do_page_fault+575} > {__link_path_walk+1658} > {link_path_walk+82} > {do_page_fault+575} > {path_lookup+451} > {__user_walk+47} > {vfs_stat+24} {do_page_fault+575} > > {sys_newstat+17} {error_exit+0} > {system_call+126} From vcmarti at sph.emory.edu Wed Jun 21 18:13:40 2006 From: vcmarti at sph.emory.edu (Vernard C. Martin) Date: Wed, 21 Jun 2006 14:13:40 -0400 Subject: [Linux-cluster] Error starting up CLVMD In-Reply-To: <448974D7.7050801@redhat.com> References: <448885C5.4050505@sph.emory.edu><1149804993.12291.27.camel@techn etium.msp.redhat.com> <448974D7.7050801@redhat.com> Message-ID: <44998C54.7090403@sph.emory.edu> Patrick Caulfield wrote: > Bob's right, it sounds like the DLM isn't loaded. The module name is just > "dlm" BTW and the device should show up in /proc/misc and (if udev is running) > /dev/misc/dlm-control. lock_dlm is the GFS interface to the DLM...yes, I know > it's confusing. > Just a followup, that was indeed the problem and everything is running nicely at this time. Well, the cluster suite is running. I've still got other issues :-) Vernard From gstaltari at arnet.net.ar Wed Jun 21 18:10:30 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Wed, 21 Jun 2006 15:10:30 -0300 Subject: [Linux-cluster] kernel panic - help! Message-ID: <44998B96.8010001@arnet.net.ar> Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the last stable cluster tarball. The cluster was OK until we had a little SAN failure, since then, the cluster (entirely) is getting kernel panic. This is the dump: qmail-be-04 kernel: ------------[ cut here ]------------ qmail-be-04 kernel: kernel BUG at /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357! qmail-be-04 kernel: invalid opcode: 0000 [#1] qmail-be-04 kernel: SMP qmail-be-04 kernel: CPU: 0 qmail-be-04 kernel: EIP is at do_dlm_unlock+0xd1/0xe5 [lock_dlm] qmail-be-04 kernel: eax: 00000004 ebx: 00000084 ecx: ffffeb92 edx: 00000000 qmail-be-04 kernel: esi: 00010001 edi: ffffffea ebp: dc9495c0 esp: e382fef4 qmail-be-04 kernel: ds: 007b es: 007b ss: 0068 qmail-be-04 kernel: Process gfs_glockd (pid: 29218, threadinfo=e382f000 task=f3524550) qmail-be-04 kernel: Stack: <0>f8e95673 f3b9f700 ffffffea 00000002 007798a8 00000000 00010001 00000084 qmail-be-04 kernel: 00000002 f9618000 00000003 dc9495c0 eaa6ae84 f8e8f52e f8eb46b5 eaa6aeb4 All nodes dies at the same time with this kernel panic. Thanks German From gstaltari at arnet.net.ar Wed Jun 21 18:30:46 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Wed, 21 Jun 2006 15:30:46 -0300 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <44998B96.8010001@arnet.net.ar> References: <44998B96.8010001@arnet.net.ar> Message-ID: <44999056.3060704@arnet.net.ar> German Staltari wrote: > Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the last > stable cluster tarball. The cluster was OK until we had a little SAN > failure, since then, the cluster (entirely) is getting kernel panic. > This is the dump: > > qmail-be-04 kernel: ------------[ cut here ]------------ > qmail-be-04 kernel: kernel BUG at > /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357! > qmail-be-04 kernel: invalid opcode: 0000 [#1] > qmail-be-04 kernel: SMP > qmail-be-04 kernel: CPU: 0 > qmail-be-04 kernel: EIP is at do_dlm_unlock+0xd1/0xe5 [lock_dlm] > qmail-be-04 kernel: eax: 00000004 ebx: 00000084 ecx: ffffeb92 > edx: 00000000 > qmail-be-04 kernel: esi: 00010001 edi: ffffffea ebp: dc9495c0 > esp: e382fef4 > qmail-be-04 kernel: ds: 007b es: 007b ss: 0068 > qmail-be-04 kernel: Process gfs_glockd (pid: 29218, > threadinfo=e382f000 task=f3524550) > qmail-be-04 kernel: Stack: <0>f8e95673 f3b9f700 ffffffea 00000002 > 007798a8 00000000 00010001 00000084 > qmail-be-04 kernel: 00000002 f9618000 00000003 dc9495c0 > eaa6ae84 f8e8f52e f8eb46b5 eaa6aeb4 > > All nodes dies at the same time with this kernel panic. > Thanks > German > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > I think this would help too: Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: Assertion failed on line 357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: assertion: "!error" Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: time = 2512697 Jun 21 14:59:58 qmail-be-04 kernel: mstore008-002: error=-22 num=2,7798a8 lkf=10001 flags=84 Thanks again German From teigland at redhat.com Wed Jun 21 18:34:29 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 21 Jun 2006 13:34:29 -0500 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <44998B96.8010001@arnet.net.ar> References: <44998B96.8010001@arnet.net.ar> Message-ID: <20060621183429.GC4706@redhat.com> On Wed, Jun 21, 2006 at 03:10:30PM -0300, German Staltari wrote: > Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the last stable > cluster tarball. The cluster was OK until we had a little SAN failure, > since then, the cluster (entirely) is getting kernel panic. This is the > dump: Any messages before this? The best you could hope for with a SAN failure is that all the cluster nodes withdraw gfs, allowing you to reboot them without the panic. So, the end result wouldn't be all that different than the panics. Dave > qmail-be-04 kernel: ------------[ cut here ]------------ > qmail-be-04 kernel: kernel BUG at > /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357! > qmail-be-04 kernel: invalid opcode: 0000 [#1] > qmail-be-04 kernel: SMP > qmail-be-04 kernel: CPU: 0 > qmail-be-04 kernel: EIP is at do_dlm_unlock+0xd1/0xe5 [lock_dlm] > qmail-be-04 kernel: eax: 00000004 ebx: 00000084 ecx: ffffeb92 edx: > 00000000 > qmail-be-04 kernel: esi: 00010001 edi: ffffffea ebp: dc9495c0 esp: > e382fef4 > qmail-be-04 kernel: ds: 007b es: 007b ss: 0068 > qmail-be-04 kernel: Process gfs_glockd (pid: 29218, threadinfo=e382f000 > task=f3524550) > qmail-be-04 kernel: Stack: <0>f8e95673 f3b9f700 ffffffea 00000002 > 007798a8 00000000 00010001 00000084 > qmail-be-04 kernel: 00000002 f9618000 00000003 dc9495c0 eaa6ae84 > f8e8f52e f8eb46b5 eaa6aeb4 From rpeterso at redhat.com Wed Jun 21 18:46:04 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Wed, 21 Jun 2006 13:46:04 -0500 Subject: [Linux-cluster] partitioning of filesystems in cluster nodes In-Reply-To: <001a01c69559$0a7cd920$1f768b60$@com> References: <001a01c69559$0a7cd920$1f768b60$@com> Message-ID: <449993EC.4030803@redhat.com> RR wrote: > > Hello all, > > > > Is there a particular manner I should partition the local filesystems > of each of the cluster nodes to support the Cluster Suite w/GFS or it > doesn't matter? My specific requirement is that I may or may not be > able to change the location where this specific application writes > data. And I need that directory/filesystem that this data is written > to, e.g. /var/spool to be accessible on my iSCSI SAN by all the > cluster nodes. The answer could be very simple but want to double check. > > > > Rgds, > > RR > Hi RR, For your the local root partitions on the individual nodes, it's probably best to use ext3. On the SAN, use GFS and Red Hat Cluster Suite. Then perhaps you can create a symlink from your local node's mount point to the SAN, e.g. from /mnt/gfs_san/var/spool to its local /var/spool. Regards, Bob Peterson Red Hat Cluster Suite From gstaltari at arnet.net.ar Wed Jun 21 18:41:58 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Wed, 21 Jun 2006 15:41:58 -0300 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <20060621183429.GC4706@redhat.com> References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com> Message-ID: <449992F6.9060608@arnet.net.ar> David Teigland wrote: > Any messages before this? The best you could hope for with a SAN failure > is that all the cluster nodes withdraw gfs, allowing you to reboot them > without the panic. So, the end result wouldn't be all that different than > the panics. > > This is the log just before the panic: Jun 21 14:59:17 qmail-be-04 kernel: CMAN: removing node qmail-be-02 from the cluster : Missed too many heartbeats Jun 21 14:59:23 qmail-be-04 kernel: CMAN: removing node qmail-be-01 from the cluster : No response to messages Jun 21 14:59:29 qmail-be-04 kernel: CMAN: removing node qmail-be-06 from the cluster : No response to messages Jun 21 14:59:39 qmail-be-04 kernel: CMAN: removing node qmail-be-03 from the cluster : No response to messages Jun 21 14:59:46 qmail-be-04 kernel: CMAN: removing node qmail-be-05 from the cluster : No response to messages Jun 21 14:59:52 qmail-be-04 kernel: CMAN: quorum lost, blocking activity Jun 21 14:59:52 qmail-be-04 kernel: CMAN: node qmail-be-04 has been removed from the cluster : No response to messages Jun 21 14:59:52 qmail-be-04 kernel: CMAN: killed by NODEDOWN message Jun 21 14:59:52 qmail-be-04 kernel: CMAN: we are leaving the cluster. No response to messages Jun 21 14:59:52 qmail-be-04 kernel: WARNING: dlm_emergency_shutdown Jun 21 14:59:52 qmail-be-04 fenced[17897]: process_events: service get event failed Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 1000041 from 3 req 1 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003b from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003b from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003b from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003b from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 3 Jun 21 14:59:53 qmail-be-04 last message repeated 7 times Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003d from 3 req 3 Jun 21 14:59:53 qmail-be-04 last message repeated 6 times Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 1000041 from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003b from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003b from 3 req 3 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 1 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 1000041 from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 1000041 from 3 req 1 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 1 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003b from 3 req 1 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003b from 3 req 5 Jun 21 14:59:53 qmail-be-04 last message repeated 5 times Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003d from 3 req 5 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003f from 3 req 5 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003d from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003d from 3 req 5 Jun 21 14:59:53 qmail-be-04 last message repeated 3 times Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003d from 3 req 9 Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid lockspace 100003d from 3 req 5 Jun 21 14:59:54 qmail-be-04 last message repeated 20 times Jun 21 14:59:54 qmail-be-04 kernel: dlm: dlm_unlock: lkid 3b013d lockspace not found Jun 21 14:59:54 qmail-be-04 kernel: store004-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore002-004 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore001-001 add_to_requestq cmd 3 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore001-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-004 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore002-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore003-004 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore002-002 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore002-004 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore002-002 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore001-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore002-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore003-002 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-004 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3 Jun 21 14:59:54 qmail-be-04 kernel: type 2 event 282 flags 21a Jun 21 14:59:54 qmail-be-04 kernel: 28975 pr_start 282 done 1 Jun 21 14:59:54 qmail-be-04 kernel: 28975 pr_finish flags 1a Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start last_stop 273 last_start 283 last_finish 273 Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start count 5 type 2 event 283 flags 21a Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start 283 done 1 Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_finish flags 1a Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start last_stop 283 last_start 285 last_finish 283 Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start count 6 type 2 event 285 flags 21a Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start 285 done 1 Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_finish flags 1a Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start last_stop 116 last_start 287 last_finish 116 Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start count 4 type 2 event 287 flags 21a Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start 287 done 1 Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_finish flags 1a Jun 21 14:59:55 qmail-be-04 kernel: 28975 rereq 2,1cec36 id e029f 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start last_stop 282 last_start 289 last_finish 282 Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start count 5 type 2 event 289 flags 21a Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start 289 done 1 Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_finish flags 1a Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start last_stop 287 last_start 291 last_finish 287 Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start count 5 type 2 event 291 flags 21a Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start 291 done 1 Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_finish flags 1a Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,2fe4b id a001e 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,3fd13 id 7007a 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,6faaf id 90009 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,2fdd6 id c0135 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,4fc8f id c023b 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,34a id 8011d 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,2fe4b id b03c3 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,6faaf id b001e 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,34a id 11000e 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,6faa8 id f0016 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,6faa8 id 1001a8 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,5fbd9 id f00e9 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,5fb4d id 802ac 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,5fbd9 id f0026 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,3fd13 id c009b 3,0 Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,2fdd6 id 8001d 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28974 rereq 2,4fc8f id c0367 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28974 rereq 2,2fe40 id a01fd 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start last_stop 289 last_start 293 last_finish 289 Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start count 6 type 2 event 293 flags 21a Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,7fa97 id 1502b3 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,3fcea id 702f3 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,5fc1e id 6015f 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,5fc1e id c01c1 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,2fdfa id c0362 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,2fdfa id c02ad 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,5fb4d id f01ce 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,7fa97 id d0293 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,3fcea id d02ad 3,0 Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start 293 done 1 Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_finish flags 1a Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start last_stop 118 last_start 295 last_finish 118 Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start count 4 type 2 event 295 flags 21a Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start 295 done 1 Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_finish flags 1a Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start last_stop 291 last_start 297 last_finish 291 Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start count 6 type 2 event 297 flags 21a Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start 297 done 1 Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_finish flags 1a Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start last_stop 295 last_start 299 last_finish 295 Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_start count 5 type 2 event 299 flags 21a Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_start 299 done 1 Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_finish flags 1a Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start last_stop 299 last_start 301 last_finish 299 Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start count 6 type 2 event 301 flags 21a Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start 301 done 1 Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_finish flags 1a Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start last_stop 120 last_start 303 last_finish 120 Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start count 4 type 2 event 303 flags 21a Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start 303 done 1 Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start last_stop 122 last_start 305 last_finish 122 Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start count 4 type 2 event 305 flags 21a Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start 305 done 1 Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_finish flags 1a Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start last_stop 124 last_start 308 last_finish 124 Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start count 4 type 2 event 308 flags 21a Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start 308 done 1 Jun 21 14:59:57 qmail-be-04 kernel: 29457 pr_finish flags 1a Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start last_stop 303 last_start 309 last_finish 303 Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start count 5 type 2 event 309 flags 21a Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start 309 done 1 Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start last_stop 305 last_start 311 last_finish 305 Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start count 5 type 2 event 311 flags 21a Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start 311 done 1 Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_finish flags 1a Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start last_stop 309 last_start 313 last_finish 309 Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start count 6 type 2 event 313 flags 21a Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start 313 done 1 Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start last_stop 308 last_start 315 last_finish 308 Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start count 5 type 2 event 315 flags 21a Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start 315 done 1 Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_finish flags 1a Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start last_stop 311 last_start 317 last_finish 311 Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start count 6 type 2 event 317 flags 21a Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start 317 done 1 Jun 21 14:59:58 qmail-be-04 kernel: 29408 pr_finish flags 1a Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start last_stop 315 last_start 319 last_finish 315 Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start count 6 type 2 event 319 flags 21a Jun 21 14:59:58 qmail-be-04 kernel: 29457 rereq 2,2bd7b5 id 801e5 3,0 Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start 319 done 1 Jun 21 14:59:58 qmail-be-04 kernel: 29457 pr_finish flags 1a Jun 21 14:59:58 qmail-be-04 kernel: Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: Assertion failed on line 357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: assertion: "!error" Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: time = 2512697 Jun 21 14:59:58 qmail-be-04 kernel: mstore008-002: error=-22 num=2,7798a8 lkf=10001 flags=84 Is the second panic today :( Thanks again German From teigland at redhat.com Wed Jun 21 19:42:49 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 21 Jun 2006 14:42:49 -0500 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <449992F6.9060608@arnet.net.ar> References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com> <449992F6.9060608@arnet.net.ar> Message-ID: <20060621194249.GB6765@redhat.com> On Wed, Jun 21, 2006 at 03:41:58PM -0300, German Staltari wrote: > Jun 21 14:59:17 qmail-be-04 kernel: CMAN: removing node qmail-be-02 from > the cluster : Missed too many heartbeats > Jun 21 14:59:23 qmail-be-04 kernel: CMAN: removing node qmail-be-01 from > the cluster : No response to messages > Jun 21 14:59:29 qmail-be-04 kernel: CMAN: removing node qmail-be-06 from > the cluster : No response to messages > Jun 21 14:59:39 qmail-be-04 kernel: CMAN: removing node qmail-be-03 from > the cluster : No response to messages > Jun 21 14:59:46 qmail-be-04 kernel: CMAN: removing node qmail-be-05 from > the cluster : No response to messages > Jun 21 14:59:52 qmail-be-04 kernel: CMAN: quorum lost, blocking activity > Jun 21 14:59:52 qmail-be-04 kernel: CMAN: node qmail-be-04 has been > removed from the cluster : No response to messages > Jun 21 14:59:52 qmail-be-04 kernel: CMAN: killed by NODEDOWN message > Jun 21 14:59:52 qmail-be-04 kernel: CMAN: we are leaving the cluster. No > response to messages This is what led to the gfs panic, the cluster shut down when it lost contact with all the other nodes. Dave > Jun 21 14:59:52 qmail-be-04 kernel: WARNING: dlm_emergency_shutdown > Jun 21 14:59:52 qmail-be-04 fenced[17897]: process_events: service get > event failed > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 1000041 from 3 req 1 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003b from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003b from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003b from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003b from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 3 > Jun 21 14:59:53 qmail-be-04 last message repeated 7 times > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003d from 3 req 3 > Jun 21 14:59:53 qmail-be-04 last message repeated 6 times > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 1000041 from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003b from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003b from 3 req 3 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 1 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 1000041 from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 1000041 from 3 req 1 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 1 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003b from 3 req 1 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003b from 3 req 5 > Jun 21 14:59:53 qmail-be-04 last message repeated 5 times > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003d from 3 req 5 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003f from 3 req 5 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003d from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003d from 3 req 5 > Jun 21 14:59:53 qmail-be-04 last message repeated 3 times > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003d from 3 req 9 > Jun 21 14:59:53 qmail-be-04 kernel: dlm: process_cluster_request invalid > lockspace 100003d from 3 req 5 > Jun 21 14:59:54 qmail-be-04 last message repeated 20 times > Jun 21 14:59:54 qmail-be-04 kernel: dlm: dlm_unlock: lkid 3b013d > lockspace not found > Jun 21 14:59:54 qmail-be-04 kernel: store004-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore002-004 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore001-001 add_to_requestq cmd 3 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore001-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-004 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore002-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore003-004 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore002-002 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore002-004 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore002-002 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore001-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore002-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore003-002 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-004 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore004-001 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: mstore003-003 add_to_requestq cmd 5 fr 3 > Jun 21 14:59:54 qmail-be-04 kernel: type 2 event 282 flags 21a > Jun 21 14:59:54 qmail-be-04 kernel: 28975 pr_start 282 done 1 > Jun 21 14:59:54 qmail-be-04 kernel: 28975 pr_finish flags 1a > Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start last_stop 273 > last_start 283 last_finish 273 > Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start count 5 type 2 event > 283 flags 21a > Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_start 283 done 1 > Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_finish flags 1a > Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start last_stop 283 > last_start 285 last_finish 283 > Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start count 6 type 2 event > 285 flags 21a > Jun 21 14:59:54 qmail-be-04 kernel: 28957 pr_start 285 done 1 > Jun 21 14:59:54 qmail-be-04 kernel: 28956 pr_finish flags 1a > Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start last_stop 116 > last_start 287 last_finish 116 > Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start count 4 type 2 event > 287 flags 21a > Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_start 287 done 1 > Jun 21 14:59:54 qmail-be-04 kernel: 28992 pr_finish flags 1a > Jun 21 14:59:55 qmail-be-04 kernel: 28975 rereq 2,1cec36 id e029f 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start last_stop 282 > last_start 289 last_finish 282 > Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start count 5 type 2 event > 289 flags 21a > Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_start 289 done 1 > Jun 21 14:59:55 qmail-be-04 kernel: 28975 pr_finish flags 1a > Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start last_stop 287 > last_start 291 last_finish 287 > Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start count 5 type 2 event > 291 flags 21a > Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_start 291 done 1 > Jun 21 14:59:55 qmail-be-04 kernel: 28992 pr_finish flags 1a > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,2fe4b id a001e 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,3fd13 id 7007a 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,6faaf id 90009 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,2fdd6 id c0135 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,4fc8f id c023b 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,34a id 8011d 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,2fe4b id b03c3 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,6faaf id b001e 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,34a id 11000e 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,6faa8 id f0016 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,6faa8 id 1001a8 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,5fbd9 id f00e9 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,5fb4d id 802ac 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,5fbd9 id f0026 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 5,3fd13 id c009b 3,0 > Jun 21 14:59:55 qmail-be-04 kernel: 28974 rereq 2,2fdd6 id 8001d 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28974 rereq 2,4fc8f id c0367 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28974 rereq 2,2fe40 id a01fd 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start last_stop 289 > last_start 293 last_finish 289 > Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start count 6 type 2 event > 293 flags 21a > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,7fa97 id 1502b3 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,3fcea id 702f3 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,5fc1e id 6015f 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,5fc1e id c01c1 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,2fdfa id c0362 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,2fdfa id c02ad 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,5fb4d id f01ce 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 2,7fa97 id d0293 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28975 rereq 5,3fcea id d02ad 3,0 > Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_start 293 done 1 > Jun 21 14:59:56 qmail-be-04 kernel: 28974 pr_finish flags 1a > Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start last_stop 118 > last_start 295 last_finish 118 > Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start count 4 type 2 event > 295 flags 21a > Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start 295 done 1 > Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_finish flags 1a > Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start last_stop 291 > last_start 297 last_finish 291 > Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start count 6 type 2 event > 297 flags 21a > Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_start 297 done 1 > Jun 21 14:59:56 qmail-be-04 kernel: 28991 pr_finish flags 1a > Jun 21 14:59:56 qmail-be-04 kernel: 29174 pr_start last_stop 295 > last_start 299 last_finish 295 > Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_start count 5 type 2 event > 299 flags 21a > Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_start 299 done 1 > Jun 21 14:59:57 qmail-be-04 kernel: 29174 pr_finish flags 1a > Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start last_stop 299 > last_start 301 last_finish 299 > Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start count 6 type 2 event > 301 flags 21a > Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_start 301 done 1 > Jun 21 14:59:57 qmail-be-04 kernel: 29175 pr_finish flags 1a > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start last_stop 120 > last_start 303 last_finish 120 > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start count 4 type 2 event > 303 flags 21a > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start 303 done 1 > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a > Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start last_stop 122 > last_start 305 last_finish 122 > Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start count 4 type 2 event > 305 flags 21a > Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start 305 done 1 > Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_finish flags 1a > Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start last_stop 124 > last_start 308 last_finish 124 > Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start count 4 type 2 event > 308 flags 21a > Jun 21 14:59:57 qmail-be-04 kernel: 29458 pr_start 308 done 1 > Jun 21 14:59:57 qmail-be-04 kernel: 29457 pr_finish flags 1a > Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start last_stop 303 > last_start 309 last_finish 303 > Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start count 5 type 2 event > 309 flags 21a > Jun 21 14:59:57 qmail-be-04 kernel: 29191 pr_start 309 done 1 > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a > Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start last_stop 305 > last_start 311 last_finish 305 > Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start count 5 type 2 event > 311 flags 21a > Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_start 311 done 1 > Jun 21 14:59:57 qmail-be-04 kernel: 29408 pr_finish flags 1a > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start last_stop 309 > last_start 313 last_finish 309 > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start count 6 type 2 event > 313 flags 21a > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_start 313 done 1 > Jun 21 14:59:57 qmail-be-04 kernel: 29192 pr_finish flags 1a > Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start last_stop 308 > last_start 315 last_finish 308 > Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start count 5 type 2 event > 315 flags 21a > Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start 315 done 1 > Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_finish flags 1a > Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start last_stop 311 > last_start 317 last_finish 311 > Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start count 6 type 2 event > 317 flags 21a > Jun 21 14:59:58 qmail-be-04 kernel: 29409 pr_start 317 done 1 > Jun 21 14:59:58 qmail-be-04 kernel: 29408 pr_finish flags 1a > Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start last_stop 315 > last_start 319 last_finish 315 > Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start count 6 type 2 event > 319 flags 21a > Jun 21 14:59:58 qmail-be-04 kernel: 29457 rereq 2,2bd7b5 id 801e5 3,0 > Jun 21 14:59:58 qmail-be-04 kernel: 29458 pr_start 319 done 1 > Jun 21 14:59:58 qmail-be-04 kernel: 29457 pr_finish flags 1a > Jun 21 14:59:58 qmail-be-04 kernel: > Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: Assertion failed on line > 357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c > Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: assertion: "!error" > Jun 21 14:59:58 qmail-be-04 kernel: lock_dlm: time = 2512697 > Jun 21 14:59:58 qmail-be-04 kernel: mstore008-002: error=-22 > num=2,7798a8 lkf=10001 flags=84 > > Is the second panic today :( > Thanks again > German From gstaltari at arnet.net.ar Wed Jun 21 19:50:07 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Wed, 21 Jun 2006 16:50:07 -0300 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <20060621194249.GB6765@redhat.com> References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com> <449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com> Message-ID: <4499A2EF.7020005@arnet.net.ar> David Teigland wrote: > On Wed, Jun 21, 2006 at 03:41:58PM -0300, German Staltari wrote: > >> Jun 21 14:59:17 qmail-be-04 kernel: CMAN: removing node qmail-be-02 from >> the cluster : Missed too many heartbeats >> Jun 21 14:59:23 qmail-be-04 kernel: CMAN: removing node qmail-be-01 from >> the cluster : No response to messages >> Jun 21 14:59:29 qmail-be-04 kernel: CMAN: removing node qmail-be-06 from >> the cluster : No response to messages >> Jun 21 14:59:39 qmail-be-04 kernel: CMAN: removing node qmail-be-03 from >> the cluster : No response to messages >> Jun 21 14:59:46 qmail-be-04 kernel: CMAN: removing node qmail-be-05 from >> the cluster : No response to messages >> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: quorum lost, blocking activity >> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: node qmail-be-04 has been >> removed from the cluster : No response to messages >> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: killed by NODEDOWN message >> Jun 21 14:59:52 qmail-be-04 kernel: CMAN: we are leaving the cluster. No >> response to messages >> > > This is what led to the gfs panic, the cluster shut down when it lost > contact with all the other nodes. > > Dave > > Ok, but this node lost contact with the cluster because all the other nodes get the same panic at the same time. We had another panic a few minutes ago... 3rd panic today... the same logs output... Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm: Assertion failed on line 357 of file /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm: assertion: "!error" Jun 21 16:13:55 qmail-be-01 kernel: lock_dlm: time = 951351 Jun 21 16:13:55 qmail-be-01 kernel: mstore008-004: error=-22 num=2,75c6db lkf=10000 flags=84 Jun 21 16:13:55 qmail-be-01 kernel: Jun 21 16:13:55 qmail-be-01 kernel: ------------[ cut here ]------------ Jun 21 16:13:55 qmail-be-01 kernel: kernel BUG at /soft/kernel/cluster-1.02.00/gfs-kernel/src/dlm/lock.c:357! Jun 21 16:13:55 qmail-be-01 kernel: invalid opcode: 0000 [#1] Jun 21 16:13:55 qmail-be-01 kernel: SMP Jun 21 16:13:55 qmail-be-01 kernel: Modules linked in: nfsd exportfs lockd nfs_acl sunrpc gfs lock_dlm lock_harness dlm cman dm_round_robin dm_multipath ipv6 ohci_hcd i2c_piix4 i2c_core e1000 sg ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc mptspi mptscsih mptbase sd_mod scsi_mod Jun 21 16:13:55 qmail-be-01 kernel: CPU: 6 Jun 21 16:13:55 qmail-be-01 kernel: EIP: 0060:[] Tainted: GF VLI Jun 21 16:13:55 qmail-be-01 kernel: EFLAGS: 00010296 (2.6.16.11-gds #1) Jun 21 16:13:55 qmail-be-01 kernel: EIP is at do_dlm_unlock+0xd1/0xe5 [lock_dlm] Jun 21 16:13:55 qmail-be-01 kernel: eax: 00000004 ebx: 00000084 ecx: ffffebd8 edx: 00000000 Jun 21 16:13:55 qmail-be-01 kernel: esi: 00010000 edi: ffffffea ebp: ca4265c0 esp: d741eef4 Jun 21 16:13:56 qmail-be-01 kernel: ds: 007b es: 007b ss: 0068 Jun 21 16:13:56 qmail-be-01 kernel: Process gfs_glockd (pid: 1061, threadinfo=d741e000 task=d6b40550) Jun 21 16:13:56 qmail-be-01 kernel: Stack: <0>f902b673 f53267e0 ffffffea 00000002 0075c6db 00000000 00010000 00000084 Jun 21 16:13:56 qmail-be-01 kernel: 00000002 f9732000 00000003 ca4265c0 cec8a4ac f902552e f905e6b5 cec8a4dc Jun 21 16:13:56 qmail-be-01 kernel: cec8a4c8 cec8a4dc f9055f02 00000296 000000d0 f9732000 f9089ee0 c539f9c0 Jun 21 16:13:56 qmail-be-01 kernel: Call Trace: Jun 21 16:13:56 qmail-be-01 kernel: [] lm_dlm_unlock+0x14/0x1c [lock_dlm] Jun 21 16:13:56 qmail-be-01 kernel: [] gfs_lm_unlock+0x2c/0x47 [gfs] Jun 21 16:13:56 qmail-be-01 kernel: [] gfs_glock_drop_th+0x84/0x182 [gfs] Jun 21 16:13:56 qmail-be-01 kernel: [] run_queue+0x348/0x374 [gfs] Jun 21 16:13:56 qmail-be-01 kernel: [] handle_callback+0xe6/0x120 [gfs] Jun 21 16:13:56 qmail-be-01 kernel: [] unlock_on_glock+0x1b/0x24 [gfs] Jun 21 16:13:56 qmail-be-01 kernel: [] gfs_reclaim_glock+0xbc/0x170 [gfs] Jun 21 16:13:56 qmail-be-01 kernel: [] _spin_lock_irqsave+0x9/0xd Jun 21 16:13:56 qmail-be-01 kernel: [] gfs_glockd+0xda/0xff [gfs] From teigland at redhat.com Wed Jun 21 20:06:16 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 21 Jun 2006 15:06:16 -0500 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <4499A2EF.7020005@arnet.net.ar> References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com> <449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com> <4499A2EF.7020005@arnet.net.ar> Message-ID: <20060621200616.GC6765@redhat.com> On Wed, Jun 21, 2006 at 04:50:07PM -0300, German Staltari wrote: > Ok, but this node lost contact with the cluster because all the other > nodes get the same panic at the same time. Any messages preceding the panicks on the other nodes? Dave From gstaltari at arnet.net.ar Wed Jun 21 21:04:36 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Wed, 21 Jun 2006 18:04:36 -0300 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <20060621200616.GC6765@redhat.com> References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com> <449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com> <4499A2EF.7020005@arnet.net.ar> <20060621200616.GC6765@redhat.com> Message-ID: <4499B464.9090902@arnet.net.ar> David Teigland wrote: > On Wed, Jun 21, 2006 at 04:50:07PM -0300, German Staltari wrote: > >> Ok, but this node lost contact with the cluster because all the other >> nodes get the same panic at the same time. >> > > Any messages preceding the panicks on the other nodes? > Dave > > > I've attached a file with the logs of all nodes at the time of the last cluster panic. Thanks for your help! German -------------- next part -------------- A non-text attachment was scrubbed... Name: kernel-oops.txt.gz Type: application/x-gzip Size: 9281 bytes Desc: not available URL: From teigland at redhat.com Wed Jun 21 21:21:33 2006 From: teigland at redhat.com (David Teigland) Date: Wed, 21 Jun 2006 16:21:33 -0500 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <4499B464.9090902@arnet.net.ar> References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com> <449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com> <4499A2EF.7020005@arnet.net.ar> <20060621200616.GC6765@redhat.com> <4499B464.9090902@arnet.net.ar> Message-ID: <20060621212132.GD6765@redhat.com> On Wed, Jun 21, 2006 at 06:04:36PM -0300, German Staltari wrote: > David Teigland wrote: > >On Wed, Jun 21, 2006 at 04:50:07PM -0300, German Staltari wrote: > > > >>Ok, but this node lost contact with the cluster because all the other > >>nodes get the same panic at the same time. > > > >Any messages preceding the panicks on the other nodes? > > > I've attached a file with the logs of all nodes at the time of the last > cluster panic. It looks like cman is shutting the cluster down everywhere prior to any gfs problems anywhere. I wonder if you might have this bug: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777 which is fixed in CVS: -rSTABLE Checking in cnxman.c; /cvs/cluster/cluster/cman-kernel/src/Attic/cnxman.c,v <-- cnxman.c new revision: 1.42.2.12.4.1.2.12; previous revision: 1.42.2.12.4.1.2.11 done Checking in membership.c; /cvs/cluster/cluster/cman-kernel/src/Attic/membership.c,v <-- membership.c new revision: 1.44.2.18.6.5; previous revision: 1.44.2.18.6.4 done Dave From gstaltari at arnet.net.ar Wed Jun 21 21:26:57 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Wed, 21 Jun 2006 18:26:57 -0300 Subject: [Linux-cluster] kernel panic - help! In-Reply-To: <20060621212132.GD6765@redhat.com> References: <44998B96.8010001@arnet.net.ar> <20060621183429.GC4706@redhat.com> <449992F6.9060608@arnet.net.ar> <20060621194249.GB6765@redhat.com> <4499A2EF.7020005@arnet.net.ar> <20060621200616.GC6765@redhat.com> <4499B464.9090902@arnet.net.ar> <20060621212132.GD6765@redhat.com> Message-ID: <4499B9A1.2070606@arnet.net.ar> David Teigland wrote: > On Wed, Jun 21, 2006 at 06:04:36PM -0300, German Staltari wrote: > >> David Teigland wrote: >> >>> On Wed, Jun 21, 2006 at 04:50:07PM -0300, German Staltari wrote: >>> >>> >>>> Ok, but this node lost contact with the cluster because all the other >>>> nodes get the same panic at the same time. >>>> >>> Any messages preceding the panicks on the other nodes? >>> >>> >> I've attached a file with the logs of all nodes at the time of the last >> cluster panic. >> > > It looks like cman is shutting the cluster down everywhere prior to any > gfs problems anywhere. I wonder if you might have this bug: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=187777 > > which is fixed in CVS: > > -rSTABLE > Checking in cnxman.c; > /cvs/cluster/cluster/cman-kernel/src/Attic/cnxman.c,v <-- cnxman.c > new revision: 1.42.2.12.4.1.2.12; previous revision: 1.42.2.12.4.1.2.11 > done > Checking in membership.c; > /cvs/cluster/cluster/cman-kernel/src/Attic/membership.c,v <-- > membership.c > new revision: 1.44.2.18.6.5; previous revision: 1.44.2.18.6.4 > done > > Dave > > > It looks like our problem, we'll be updating to the STABLE CVS version. Thanks Dave :) German From vlaurenz at advance.net Thu Jun 22 04:23:58 2006 From: vlaurenz at advance.net (Vito Laurenza) Date: Thu, 22 Jun 2006 00:23:58 -0400 Subject: [Linux-cluster] Cluster Failover Scripts... Message-ID: <15509160.1150950238347.JavaMail.root@brimley.host.advance.net> Hello all, I've written a script to notify via email on Cluster Suite events (failovers, etc) and have added it to my service in cluster.conf, but I've noticed that manual failovers are not processing properly. My failover script runs, serviceB is relocated, but serviceA only stops on the source node and does not start on the destination node. Is it ok to list more than one script per service in cluster.conf? Am I going about this the wrong way? I'd appreciate any help on this matter. Below is snippet from my cluster.conf: This is basically a parent/child dependency: "drbd" will be started before "httpd", and stopped after httpd has been stopped. It might be possible to grab linux-ha's drbd resource and use it almost out of the box with RHCS, but I haven't tried this either. Either way, rgmanager should probably have a DRBD resource agent at some point. -- Lon From rpeterso at redhat.com Fri Jun 23 20:57:55 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Fri, 23 Jun 2006 15:57:55 -0500 Subject: [Linux-cluster] files with unknown state - locking problem? In-Reply-To: <449C05C6.9070406@arnet.net.ar> References: <449C05C6.9070406@arnet.net.ar> Message-ID: <449C55D3.6000901@redhat.com> German Staltari wrote: > Hi, we have a 6 node cluster with FC4, kernel 2.6.16 and the CVS > STABLE branch of the cluster software. Sometimes, some processes > (courier imap) hangs in D state. When I execute "ls -la" in the "tmp" > directory (the directory is always the same, the same mailbox) of the > mailbox that it's triyng to access the process, the answer is really > slow and this is the output: > > ?--------- ? ? ? ? ? > 1151074448.M345358P6861_courierlock.qmail-be-04 > ?--------- ? ? ? ? ? > 1151074497.M326691P7647_courierlock.qmail-be-04 > ?--------- ? ? ? ? ? > 1151074534.M524707P2198_courierlock.qmail-be-05 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:07 > 1151074538.M785749P13408_courierlock.qmail-be-03 > -rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:09 > 1151074588.M917441P3132_courierlock.qmail-be-05 > -rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:09 > 1151074593.M62901P3189_courierlock.qmail-be-05 > ?--------- ? ? ? ? ? > 1151074649.M845223P5214_courierlock.qmail-be-02 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:09 > 1151074656.M448306P28724_courierlock.qmail-be-06 > -rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:07 > 1151074657.M188653P5302_courierlock.qmail-be-02 > ?--------- ? ? ? ? ? > 1151074679.M821433P4979_courierlock.qmail-be-05 > -rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:07 > 1151074690.M360083P5741_courierlock.qmail-be-02 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:07 > 1151074701.M709923P29422_courierlock.qmail-be-06 > -rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:07 > 1151074716.M544858P6016_courierlock.qmail-be-02 > -rw-r--r-- 1 mailuser mailuser 16 Jun 23 12:07 > 1151074731.M21587P6179_courierlock.qmail-be-02 > ?--------- ? ? ? ? ? > 1151074804.M241436P7410_courierlock.qmail-be-02 > ?--------- ? ? ? ? ? > 1151074831.M678238P17302_courierlock.qmail-be-03 > ?--------- ? ? ? ? ? > 1151074917.M42708P8494_courierlock.qmail-be-05 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:08 > 1151074918.M541477P14716_courierlock.qmail-be-04 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:08 > 1151074946.M520653P15248_courierlock.qmail-be-04 > ?--------- ? ? ? ? ? > 1151075037.M234721P11020_courierlock.qmail-be-02 > ?--------- ? ? ? ? ? > 1151075065.M951224P8598_courierlock.qmail-be-01 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:09 > 1151075082.M788480P11712_courierlock.qmail-be-02 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:09 > 1151075186.M911867P18565_courierlock.qmail-be-04 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:08 > 1151075210.M366861P13891_courierlock.qmail-be-02 > -rw-r--r-- 1 mailuser mailuser 17 Jun 23 12:09 > 1151075217.M850817P13366_courierlock.qmail-be-05 > ?--------- ? ? ? ? ? > 1151075252.M599978P32483_imapuid_4.qmail-be-05 > > It seems like a lock problem, but not sure. Is there any other tool > that I can use to debug this? > Thanks > German > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Hi German, I suspect you are right: The question marks in ls -l leads me to believe there might be a problem somewhere regarding the locking of the files. My theory is this: ls -l calls a kernel stat function to get file statistics. The stat tries to acquire an internal lock (glock), but can't, so it displays what you see instead of valid values. Perhaps courier imap is locking files, then hanging, and the process is somehow hanging around with the lock intact, or else killed abnormally where the lock is not released. Do you have any suggestions how we can recreate this problem in our lab? Regards, Bob Peterson Red Hat Cluster Suite From Quentin.Arce at Sun.COM Fri Jun 23 23:20:24 2006 From: Quentin.Arce at Sun.COM (qarce) Date: Fri, 23 Jun 2006 16:20:24 -0700 Subject: [Linux-cluster] GFS and iscsi ? Message-ID: <449C7738.2070507@Sun.Com> Hi, I just sent this to the irc room ... but everyone seems to be out. (15:53:58) *qarce:* I have a GFS ? (15:54:28) *qarce:* I have setup a redhat cluster with 2 nodes. I can mount my GFS block device and use it. (15:54:38) *qarce:* but I would like to to happen automatically (15:54:57) *qarce:* the GFS block device I'm using is an iSCSI disk (16:07:29) *qarce:* Comments anyone? I have tried adding a line to the fstab /dev/sdd1 /data gfs defaults 0 0 This didn't work. I tried adding a line to /etc/rc.local to mount it but this didn't work. iscsi-rescan mount -t gfs /dev/sdd1 /data If I reboot, login and just run mount -t gfs /dev/sdd1 /data it mounts just fine. Comments / Ideas / Your thoughts. Oh. I can't change change to a different back end block device. I have to use iSCSI. Thank you, Quentin From gstaltari at arnet.net.ar Fri Jun 23 22:23:05 2006 From: gstaltari at arnet.net.ar (German Staltari) Date: Fri, 23 Jun 2006 19:23:05 -0300 Subject: [Linux-cluster] files with unknown state - locking problem? In-Reply-To: <449C55D3.6000901@redhat.com> References: <449C05C6.9070406@arnet.net.ar> <449C55D3.6000901@redhat.com> Message-ID: <449C69C9.8070300@arnet.net.ar> Robert Peterson wrote: > I suspect you are right: The question marks in ls -l leads me to > believe there > might be a problem somewhere regarding the locking of the files. My > theory is this: > ls -l calls a kernel stat function to get file statistics. The stat > tries to acquire an internal > lock (glock), but can't, so it displays what you see instead of valid > values. > > Perhaps courier imap is locking files, then hanging, and the process > is somehow > hanging around with the lock intact, or else killed abnormally where > the lock is not released. > Do you have any suggestions how we can recreate this problem in our lab? > Robert, here is the scenario: we have a pool of webmail servers accessing mailboxes in the cluster via IMAP (courier-imap 3.0.8). One user may hit the mailbox a couple of times in a short period of time when using the webmail (webmail is stateless, so it has to reread all mailbox structure in each webmail action). Maybe the problem is in courier-imap, so I was digging it's source code, and found all locking stuff in liblock and maildir directories (maildir/maildirlock.c -> maildir_lock(), liblock/mail.c -> ll_dotlock()). I know that we are not debugging courier-imap, but it may help. I found a post of Lon http://www.redhat.com/archives/linux-cluster/2004-October/msg00306.html, that recommends using IMAP_USELOCKS and I've checked our conf and it's enabled. So, maybe the best way to recreate the problem is installing courier-imap, and access a mailbox user from different clients at the same time (in a short period of time would be best). I hope this helps, Thanks for your help and time, German So, I think it could be possible From michaelc at cs.wisc.edu Sat Jun 24 00:13:36 2006 From: michaelc at cs.wisc.edu (Mike Christie) Date: Fri, 23 Jun 2006 19:13:36 -0500 Subject: [Linux-cluster] GFS and iscsi ? In-Reply-To: <449C7738.2070507@Sun.Com> References: <449C7738.2070507@Sun.Com> Message-ID: <449C83B0.5060000@cs.wisc.edu> qarce wrote: > Hi, > > I just sent this to the irc room ... but everyone seems to be out. > > > (15:53:58) *qarce:* I have a GFS ? > (15:54:28) *qarce:* I have setup a redhat cluster with 2 nodes. I can > mount my GFS block device and use it. > (15:54:38) *qarce:* but I would like to to happen automatically > (15:54:57) *qarce:* the GFS block device I'm using is an iSCSI disk > (16:07:29) *qarce:* Comments anyone? > > I have tried adding a line to the fstab > > /dev/sdd1 /data gfs defaults 0 0 > See the readme http://people.redhat.com/mchristi/iscsi/RHEL4/doc/readme You need to use _netdev in that fstab entry. From Quentin.Arce at Sun.COM Sat Jun 24 00:17:26 2006 From: Quentin.Arce at Sun.COM (qarce) Date: Fri, 23 Jun 2006 17:17:26 -0700 Subject: [Linux-cluster] GFS and iscsi ? In-Reply-To: <449C83B0.5060000@cs.wisc.edu> References: <449C7738.2070507@Sun.Com> <449C83B0.5060000@cs.wisc.edu> Message-ID: <449C8496.40801@Sun.Com> Mike Christie wrote: >qarce wrote: > > >>Hi, >> >>I just sent this to the irc room ... but everyone seems to be out. >> >> >>(15:53:58) *qarce:* I have a GFS ? >>(15:54:28) *qarce:* I have setup a redhat cluster with 2 nodes. I can >>mount my GFS block device and use it. >>(15:54:38) *qarce:* but I would like to to happen automatically >>(15:54:57) *qarce:* the GFS block device I'm using is an iSCSI disk >>(16:07:29) *qarce:* Comments anyone? >> >>I have tried adding a line to the fstab >> >>/dev/sdd1 /data gfs defaults 0 0 >> >> >> > >See the readme >http://people.redhat.com/mchristi/iscsi/RHEL4/doc/readme > >You need to use _netdev in that fstab entry. > > THANK YOU!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! SO, much :-) Q >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > From aneesh.kumar at gmail.com Mon Jun 26 17:58:30 2006 From: aneesh.kumar at gmail.com (Aneesh Kumar) Date: Mon, 26 Jun 2006 23:28:30 +0530 Subject: [Linux-cluster] Help with CMAN Message-ID: Hi All, I am looking at using CMAN as the membership manager for the cluster project i am working on. The project ( http://ci-linux.sf.net ) helps in writing kernel cluster services much easier and also make the service independent of transport. We TCP/IP and IP over infiniband already done. People are working on tranport using IB verbs. I am right now trying to understand how to use CMAN as the membership service. Which is the source code against i should work on. i was going through the code and found a cman-kernel and cman directory. But then cman-kernel module was making some level communication from within the kernel. Is there a documentation explaining these components and how they interact ? -aneesh From sdake at redhat.com Mon Jun 26 19:47:38 2006 From: sdake at redhat.com (Steven Dake) Date: Mon, 26 Jun 2006 12:47:38 -0700 Subject: [Linux-cluster] Help with CMAN In-Reply-To: References: Message-ID: <1151351258.30084.40.camel@shih.broked.org> Aneesh, In the latest code, the membership layer is handled entirely in userspace. The CMAN component is a plugin of the openais standards based cluster framework. openais uses a protocol called The Totem Single Ring Ordering and Membership protocol for all communication. It would be possible to feed membership messages and regular messages from totem into the kernel using configfs or some other system. I believe some of this work has already been done. Dave would know more since its his area of expertise. Regards -steve On Mon, 2006-06-26 at 23:28 +0530, Aneesh Kumar wrote: > Hi All, > > I am looking at using CMAN as the membership manager for the cluster > project i am working on. The project ( http://ci-linux.sf.net ) helps > in writing kernel cluster services much easier and also make the > service independent of transport. We TCP/IP and IP over infiniband > already done. People are working on tranport using IB verbs. > > I am right now trying to understand how to use CMAN as the > membership service. Which is the source code against i should work on. > i was going through the code and found a cman-kernel and cman > directory. But then cman-kernel module was making some level > communication from within the kernel. Is there a documentation > explaining these components and how they interact ? > > -aneesh > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From teigland at redhat.com Mon Jun 26 20:29:00 2006 From: teigland at redhat.com (David Teigland) Date: Mon, 26 Jun 2006 15:29:00 -0500 Subject: [Linux-cluster] Help with CMAN In-Reply-To: <1151351258.30084.40.camel@shih.broked.org> References: <1151351258.30084.40.camel@shih.broked.org> Message-ID: <20060626202859.GD1375@redhat.com> On Mon, Jun 26, 2006 at 12:47:38PM -0700, Steven Dake wrote: > Aneesh, > > In the latest code, the membership layer is handled entirely in > userspace. The CMAN component is a plugin of the openais standards > based cluster framework. openais uses a protocol called The Totem > Single Ring Ordering and Membership protocol for all communication. > > It would be possible to feed membership messages and regular messages > from totem into the kernel using configfs or some other system. I > believe some of this work has already been done. Dave would know more > since its his area of expertise. GFS and DLM now both have userland components to interact with cman/openais clustering infrastructure, see the gfs_controld and dlm_controld daemons. Dave From Fabrizio.Lippolis at AurigaInformatica.it Tue Jun 27 08:40:21 2006 From: Fabrizio.Lippolis at AurigaInformatica.it (Fabrizio Lippolis) Date: Tue, 27 Jun 2006 10:40:21 +0200 Subject: [Linux-cluster] "Missed too many heartbeats" messages and hung cluster Message-ID: <44A0EEF5.2090501@aurigainformatica.it> I have configured two machines in a cluster domain to run mysql and ldap services. Everything works correctly except that from time to time, seems randomly, the two machines hung. Recently this is what I see in the log of the second machine: Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the cluster : Missed too many heartbeats Jun 23 23:37:17 AICLSRV02 fenced[2004]: AICLSRV01 not a cluster member after 0 sec post_fail_delay Jun 23 23:37:17 AICLSRV02 fenced[2004]: fencing node "AICLSRV01" Jun 23 23:37:17 AICLSRV02 fence_manual: Node AICLSRV01 needs to be reset before recovery can procede. Waiting for AICLSRV01 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n AICLSRV01) A few seconds later the same messages appeared on the first machine: Jun 23 23:37:36 AICLSRV01 kernel: CMAN: removing node AICLSRV02 from the cluster : Missed too many heartbeats Jun 23 23:37:36 AICLSRV01 fenced[2084]: AICLSRV02 not a cluster member after 0 sec post_fail_delay Jun 23 23:37:36 AICLSRV01 fenced[2084]: fencing node "AICLSRV02" Jun 23 23:37:39 AICLSRV01 fence_manual: Node AICLSRV02 needs to be reset before recovery can procede. Waiting for AICLSRV02 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n AICLSRV02) The two machines have been resetted to let them work again. Anybody could please explain what happened to cause this problem? I would also need a suggestion on how to configure a fence device so that the services could still continue to work. As you see actually I configured manual fence but that's not much useful. Thank you in advance. -- Fabrizio Lippolis fabrizio.lippolis at aurigainformatica.it Auriga Informatica s.r.l. Via Don Guanella 15/B - 70124 Bari Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/ From l.dardini at comune.prato.it Tue Jun 27 08:51:12 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Tue, 27 Jun 2006 10:51:12 +0200 Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and hungcluster In-Reply-To: <44A0EEF5.2090501@aurigainformatica.it> Message-ID: <0C5C8B118420264EBB94D7D7050150012A0028@exchange2.comune.prato.local> Running a two machine cluster is a bad thing (but due to budget limitation, I am doing the same bad thing). If something happens between the two machine, they fence each other. In this particular case, I think you have some sort of network problem between the two machine. You can try to "ping" each other and see, when the problem arise, the connectivity state. Maybe a "too much intelligent switch" is handling the traffic and have some sort of "traffic shaping and control". Leandro -----Messaggio originale----- Da: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] Per conto di Fabrizio Lippolis Inviato: marted? 27 giugno 2006 10.40 A: linux-cluster at redhat.com Oggetto: [Linux-cluster] "Missed too many heartbeats" messages and hungcluster I have configured two machines in a cluster domain to run mysql and ldap services. Everything works correctly except that from time to time, seems randomly, the two machines hung. Recently this is what I see in the log of the second machine: Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the cluster : Missed too many heartbeats Jun 23 23:37:17 AICLSRV02 fenced[2004]: AICLSRV01 not a cluster member after 0 sec post_fail_delay Jun 23 23:37:17 AICLSRV02 fenced[2004]: fencing node "AICLSRV01" Jun 23 23:37:17 AICLSRV02 fence_manual: Node AICLSRV01 needs to be reset before recovery can procede. Waiting for AICLSRV01 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n AICLSRV01) A few seconds later the same messages appeared on the first machine: Jun 23 23:37:36 AICLSRV01 kernel: CMAN: removing node AICLSRV02 from the cluster : Missed too many heartbeats Jun 23 23:37:36 AICLSRV01 fenced[2084]: AICLSRV02 not a cluster member after 0 sec post_fail_delay Jun 23 23:37:36 AICLSRV01 fenced[2084]: fencing node "AICLSRV02" Jun 23 23:37:39 AICLSRV01 fence_manual: Node AICLSRV02 needs to be reset before recovery can procede. Waiting for AICLSRV02 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n AICLSRV02) The two machines have been resetted to let them work again. Anybody could please explain what happened to cause this problem? I would also need a suggestion on how to configure a fence device so that the services could still continue to work. As you see actually I configured manual fence but that's not much useful. Thank you in advance. -- Fabrizio Lippolis fabrizio.lippolis at aurigainformatica.it Auriga Informatica s.r.l. Via Don Guanella 15/B - 70124 Bari Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From pcaulfie at redhat.com Tue Jun 27 09:01:59 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Tue, 27 Jun 2006 10:01:59 +0100 Subject: [Linux-cluster] "Missed too many heartbeats" messages and hung cluster In-Reply-To: <44A0EEF5.2090501@aurigainformatica.it> References: <44A0EEF5.2090501@aurigainformatica.it> Message-ID: <44A0F407.4090403@redhat.com> Fabrizio Lippolis wrote: > I have configured two machines in a cluster domain to run mysql and ldap > services. Everything works correctly except that from time to time, > seems randomly, the two machines hung. Recently this is what I see in > the log of the second machine: > > Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the > cluster : Missed too many heartbeats That message means that the heartbeat messages are getting lost somehow. either through an unreliable network link or something else odd happening on the machine to prevent the heartbeat packets reaching the network. > > The two machines have been resetted to let them work again. Anybody > could please explain what happened to cause this problem? I would also > need a suggestion on how to configure a fence device so that the > services could still continue to work. As you see actually I configured > manual fence but that's not much useful. Thank you in advance. > -- patrick From Fabrizio.Lippolis at AurigaInformatica.it Tue Jun 27 09:51:58 2006 From: Fabrizio.Lippolis at AurigaInformatica.it (Fabrizio Lippolis) Date: Tue, 27 Jun 2006 11:51:58 +0200 Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and hung cluster In-Reply-To: <0C5C8B118420264EBB94D7D7050150012A0028@exchange2.comune.prato.local> References: <0C5C8B118420264EBB94D7D7050150012A0028@exchange2.comune.prato.local> Message-ID: <44A0FFBE.4040904@aurigainformatica.it> Leandro Dardini ha scritto: > If something happens between the two machine, they fence each other. I have configured manual fencing but as I wrote it's not much useful since, I think, requires manual handling which couldn't be possible immediately. Therefore I am looking for a method to let the services run even if such a thing happens. This is not the first time the problem arises, apparently without a reason, though the last time happened long time ago. > You can try to "ping" each other and see, when the problem arise, the connectivity state. Sometimes the machines are completely locked and it's not even possible to log in. A brute force switch off is necessary in this case. Sometimes looks like only the cluster service is locked and I can regularly ping the other machine though the cluster is not working. > Maybe a "too much intelligent switch" is handling the traffic and have some sort of "traffic shaping and control". There is nothing like that, the two machines are connected by a 1GB crossover cable, not even so long, provided by HP with the two machines. -- Fabrizio Lippolis fabrizio.lippolis at aurigainformatica.it Auriga Informatica s.r.l. Via Don Guanella 15/B - 70124 Bari Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/ From l.dardini at comune.prato.it Tue Jun 27 10:04:36 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Tue, 27 Jun 2006 12:04:36 +0200 Subject: R: R: [Linux-cluster] "Missed too many heartbeats" messages andhung cluster In-Reply-To: <44A0FFBE.4040904@aurigainformatica.it> Message-ID: <0C5C8B118420264EBB94D7D7050150012A0034@exchange2.comune.prato.local> > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di > Fabrizio Lippolis > Inviato: marted? 27 giugno 2006 11.52 > A: linux clustering > Oggetto: Re: R: [Linux-cluster] "Missed too many heartbeats" > messages andhung cluster > > Leandro Dardini ha scritto: > > > If something happens between the two machine, they fence each other. > > I have configured manual fencing but as I wrote it's not much > useful since, I think, requires manual handling which > couldn't be possible immediately. Therefore I am looking for > a method to let the services run even if such a thing > happens. This is not the first time the problem arises, > apparently without a reason, though the last time happened > long time ago. > > > You can try to "ping" each other and see, when the problem > arise, the connectivity state. > > Sometimes the machines are completely locked and it's not > even possible to log in. A brute force switch off is > necessary in this case. Sometimes looks like only the cluster > service is locked and I can regularly ping the other machine > though the cluster is not working. This is really bad. This smells like an hardware problem or buggy kernel driver. Try to stress test the machines individually without cluster support. I usually start with a memtest from a Knoppix CD and then build a kernel for CPU stress. Try to transfer huge chunk of data to test the lan. Leandro > > > Maybe a "too much intelligent switch" is handling the > traffic and have some sort of "traffic shaping and control". > > There is nothing like that, the two machines are connected by > a 1GB crossover cable, not even so long, provided by HP with > the two machines. > > -- > Fabrizio Lippolis > fabrizio.lippolis at aurigainformatica.it > Auriga Informatica s.r.l. Via Don Guanella 15/B - > 70124 Bari > Tel.: 080/5025414 - Fax: 080/5027448 - > http://www.aurigainformatica.it/ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From vaden at texoma.net Tue Jun 27 13:24:43 2006 From: vaden at texoma.net (Larry Vaden) Date: Tue, 27 Jun 2006 08:24:43 -0500 Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui Message-ID: The RH documentation apparently presumes/requires the use of gui/X11. What's the best howto if one chooses not to use gui/X11 on the servers to be clustered? Kind regards, Larry Vaden Internet Texoma, Inc. From Fabrizio.Lippolis at AurigaInformatica.it Tue Jun 27 13:35:35 2006 From: Fabrizio.Lippolis at AurigaInformatica.it (Fabrizio Lippolis) Date: Tue, 27 Jun 2006 15:35:35 +0200 Subject: [Linux-cluster] "Missed too many heartbeats" messages and hung cluster In-Reply-To: <44A0F407.4090403@redhat.com> References: <44A0EEF5.2090501@aurigainformatica.it> <44A0F407.4090403@redhat.com> Message-ID: <44A13427.7080709@aurigainformatica.it> Patrick Caulfield ha scritto: >> Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the >> cluster : Missed too many heartbeats > > > That message means that the heartbeat messages are getting lost somehow. > either through an unreliable network link or something else odd happening on > the machine to prevent the heartbeat packets reaching the network. This is very strange since the two machines are connected by a gigabit crossover cable and no other device is in the middle. Also, no firewall rules are configured on any machine. By the way, actually I am using the fence manual method but it isn't much helpful and I would like to switch to a method that ensures a reliable service. Does it mean I have to buy a device sitting in the middle of the machines that connects network and power cables? I am rather new to it so please any suggestion is welcome. -- Fabrizio Lippolis fabrizio.lippolis at aurigainformatica.it Auriga Informatica s.r.l. Via Don Guanella 15/B - 70124 Bari Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/ From Matthew.Patton.ctr at osd.mil Tue Jun 27 13:41:07 2006 From: Matthew.Patton.ctr at osd.mil (Patton, Matthew F, CTR, OSD-PA&E) Date: Tue, 27 Jun 2006 09:41:07 -0400 Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui Message-ID: Classification: UNCLASSIFIED I noticed that too. Redhat, please make it a policy such that RHEL4.4 onward that command-line tools will be the first and primary means of configuring anything not expressly GUI-related. requiring GUI tools to admin a server is well, unprintable. > -----Original Message----- > The RH documentation apparently presumes/requires the use of gui/X11. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Tue Jun 27 14:06:34 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Tue, 27 Jun 2006 09:06:34 -0500 Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui In-Reply-To: References: Message-ID: <44A13B6A.2060503@redhat.com> Larry Vaden wrote: > The RH documentation apparently presumes/requires the use of gui/X11. > > What's the best howto if one chooses not to use gui/X11 on the servers > to be clustered? > > Kind regards, > > Larry Vaden > Internet Texoma, Inc. Hi Larry, I've mentioned this before, but I've got an "NFS/GFS Cookbook" that has step-by-step instructions for setting up a cluster using both the GUI or command-line. I think it's more geared toward command-line because I didn't even include screen-shots of the gui. It's located here: http://sources.redhat.com/cluster/doc/nfscookbook.pdf - The Unofficial NFS/GFS Cookbook. I don't think I'd call it "the best howto" because I know it needs some work. (People have sent me corrections that I haven't had time to implement yet). It's not even an official Red Hat document, but I've been trying to push it that direction. I hope this helps. And of course, if you have corrections, please send them my way and I'll eventually get time to implement them. Regards, Bob Peterson Red Hat Cluster Suite From doc at zwecker.de Tue Jun 27 15:30:56 2006 From: doc at zwecker.de (Christophe Zwecker) Date: Tue, 27 Jun 2006 17:30:56 +0200 Subject: [Linux-cluster] is PVFS Raid0 or Raid1 over Network Message-ID: <44A14F30.80600@zwecker.de> Hi, I (newbie) am currently looking into getting somthing like Raid1 over Network for our 2 Node Cluster, I ran over PVFS and not sure wether its suited for us. We want somthing like having a local storage on each node that syncs withthe storage on the other node, like saying raid1 over network. is that what pvfs is doing ? thx for clearing this up for me. Christophe -- Christophe Zwecker mail: doc at zwecker.de Hamburg, Germany fon: +49 179 3994867 http://www.zwecker.de "Reality is that which, when you stop believing in it, doesn't go away" From vaden at texoma.net Tue Jun 27 16:49:21 2006 From: vaden at texoma.net (Larry Vaden) Date: Tue, 27 Jun 2006 11:49:21 -0500 Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui In-Reply-To: <44A13B6A.2060503@redhat.com> References: <44A13B6A.2060503@redhat.com> Message-ID: On 6/27/06, Robert Peterson wrote: > Larry Vaden wrote: > > The RH documentation apparently presumes/requires the use of gui/X11. > > > > What's the best howto if one chooses not to use gui/X11 on the servers > > to be clustered? > > > > Kind regards, > > > > Larry Vaden > > Internet Texoma, Inc. > Hi Larry, > > I've mentioned this before, but I've got an "NFS/GFS Cookbook" that has > step-by-step > instructions for setting up a cluster using both the GUI or > command-line. I think it's > more geared toward command-line because I didn't even include > screen-shots of the gui. > It's located here: > > http://sources.redhat.com/cluster/doc/nfscookbook.pdf - The Unofficial > NFS/GFS Cookbook. > > I don't think I'd call it "the best howto" because I know it needs some > work. > (People have sent me corrections that I haven't had time to implement yet). > It's not even an official Red Hat document, but I've been trying to push > it that > direction. I hope this helps. > > And of course, if you have corrections, please send them my way and I'll > eventually > get time to implement them. > > Regards, > > Bob Peterson > Red Hat Cluster Suite Hi Bob, I think I'll break list netiquette and thank you for your work and for the clue(s). Has GULM been more or less officially deprecated in favor of DLM? Any other gotchas for late adopters? rgds/ldv From rpeterso at redhat.com Tue Jun 27 17:57:44 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Tue, 27 Jun 2006 12:57:44 -0500 Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui In-Reply-To: References: <44A13B6A.2060503@redhat.com> Message-ID: <44A17198.2090807@redhat.com> Larry Vaden wrote: > Hi Bob, > > I think I'll break list netiquette and thank you for your work and for > the clue(s). > > Has GULM been more or less officially deprecated in favor of DLM? > > Any other gotchas for late adopters? > > rgds/ldv Hi Larry, The plan, as I understand it, is to support GULM locking for RHEL4, but starting with RHEL5, GULM will no longer be supported in favor of DLM. That may alarm some people because Oracle is only certified to work with GULM today, but needless to say, we'll be going through re-certification with Oracle for RHEL5, so we will have alternatives. It's pretty simple to switch from GULM to DLM and back anyway. As for gotchas: Well, I've been working on a Cluster Suite FAQ (Frequently Asked Questions) that I hope to make public soon, maybe as early as this week. (I'll post something here when I do). I'm just waiting for feedback from some of the developers before I post it. Regards, Bob Peterson Red Hat Cluster Suite From kanderso at redhat.com Tue Jun 27 18:30:37 2006 From: kanderso at redhat.com (Kevin Anderson) Date: Tue, 27 Jun 2006 13:30:37 -0500 Subject: [Linux-cluster] best csgfs howto not presuming/requiring gui In-Reply-To: <44A17198.2090807@redhat.com> References: <44A13B6A.2060503@redhat.com> <44A17198.2090807@redhat.com> Message-ID: <1151433037.2784.9.camel@dhcp80-204.msp.redhat.com> On Tue, 2006-06-27 at 12:57 -0500, Robert Peterson wrote: > Larry Vaden wrote: > The plan, as I understand it, is to support GULM locking for RHEL4, but > starting with RHEL5, GULM will no longer be supported in favor of DLM. > That may alarm some people because Oracle is only certified to work with > GULM today, but needless to say, we'll be going through re-certification > with Oracle for RHEL5, so we will have alternatives. It's pretty simple > to switch from GULM to DLM and back anyway. > Also, the new upstream DLM has the ability to be configured in a client/server model arrangement, where all lock requests go to specific DLM nodes rather than being resolved locally. You can tune the usage to your configuration and still have dedicated lock manager servers if you choose. This capability makes GuLM redundant with the DLM functionality and doesn't make sense to continue to port GuLM forward and support. I think Dave Teigland has documented how to do it in either previous mailings or on the website, will let him comment with the details. Thanks Kevin From Stefano.Schiavi at aem.torino.it Mon Jun 26 13:29:38 2006 From: Stefano.Schiavi at aem.torino.it (Stefano Schiavi) Date: Mon, 26 Jun 2006 15:29:38 +0200 Subject: [Linux-cluster] GFS-6.0.2.20-2 doesn't accept rebooted nodes Message-ID: Hi gurus. We have a three nodes Itanium 64 with GFS in conjuction with OCFS for a Oracle RAC We have find many phisical problems in our switch , and made a sobstitution of the switches. Here is the problem : the first node of the cluster doesn't re-login to the gfs: here is the situation:made from the master : [root at sapcl02 spool]# gulm_tool nodelist sapcl02:core Name: sapcl03.aem.torino.it ip = 100.2.254.210 state = Logged in mode = Slave missed beats = 0 last beat = 1151328027843676 delay avg = 10000443 max delay = 13047588 Name: sapcl01.aem.torino.it ip = 100.2.254.208 state = Expired mode = Slave missed beats = 0 last beat = 0 delay avg = 0 max delay = 0 Name: sapcl02.aem.torino.it ip = 100.2.254.209 state = Logged in mode = Master missed beats = 0 last beat = 1151328021593557 delay avg = 10000849 max delay = 113821588141 as you can see ...sapcl01 is in state expired. In sapcl01 the startint of lock_gulmd hung .... >From the /var/log/message of the master i see ....infinitely repetuted .... Jun 26 15:23:32 sapcl02 lock_gulmd_core[22601]: Gonna exec fence_node sapcl01.aem.torino.it Jun 26 15:23:32 sapcl02 fence_node[22601]: Cannot locate the cluster node, sapcl01.aem.torino.it Jun 26 15:23:32 sapcl02 fence_node[22601]: All fencing methods FAILED! Jun 26 15:23:32 sapcl02 fence_node[22601]: Fence of "sapcl01.aem.torino.it" was unsuccessful. Jun 26 15:23:32 sapcl02 lock_gulmd_core[7499]: Fence failed. [22601] Exit code:1 Running it again. Jun 26 15:23:32 sapcl02 lock_gulmd_core[7499]: Forked [22604] fence_node sapcl01.aem.torino.it with a 5 pause. also if i power down the sapcl01 node , the master try and try to fence the slave node. Also , in the master and the slave , i try to manually fence for eliminate the expiration . But no results. It seems that the only way to reallinate the cluster is to GLOBALLY power down the entire nodes , and restart. here is the configuration files: ########### fence.ccs ######################################## fence_devices { nps { agent = "fence_wti" ipaddr = "100.2.254.254" login = "nps" passwd = "password" } } [root at sapcl01 gfs]# more nodes.ccs #### nodes.ccs ####################################### nodes { sapcl01 { ip_interfaces { eth1 = "192.168.2.208" } fence { power { nps { port = 1 } } } } sapcl02 { ip_interfaces { eth1 = "192.168.2.209" } fence { power { nps { port = 2 } } } } sapcl03 { ip_interfaces { eth1 = "192.168.2.210" } fence { power { nps { port = 3 } } [root at sapcl01 gfs]# more cluster.ccs #### cluster.ccs ##################################### cluster { name = "gfsrac" lock_gulm { servers = [ "sapcl01","sapcl02","sapcl03" ] } } PS the cluster is was fully operational from 7 months ago. the change of the switch is the problem Best regards Stefano From troy.stepan at unisys.com Mon Jun 26 14:38:43 2006 From: troy.stepan at unisys.com (Stepan, Troy) Date: Mon, 26 Jun 2006 10:38:43 -0400 Subject: [Linux-cluster] Equivalent to RHCS? Message-ID: <94C8C9E8B25F564F95185BDA64AB05F603C3E92C@USTR-EXCH5.na.uis.unisys.com> I hate to bother you guys with dumb questions, but I'm confused-- Looking at the components, it looks like this project is the core of Red Hat Cluster Suite. I take it RHCS itself is not open, but its "root" components are? How does RHCS compare to these open projects? Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcaulfie at redhat.com Wed Jun 28 07:08:45 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Wed, 28 Jun 2006 08:08:45 +0100 Subject: [Linux-cluster] Equivalent to RHCS? In-Reply-To: <94C8C9E8B25F564F95185BDA64AB05F603C3E92C@USTR-EXCH5.na.uis.unisys.com> References: <94C8C9E8B25F564F95185BDA64AB05F603C3E92C@USTR-EXCH5.na.uis.unisys.com> Message-ID: <44A22AFD.7090506@redhat.com> Stepan, Troy wrote: > I hate to bother you guys with dumb questions, but I?m confused-- > > > > Looking at the components, it looks like this project is the core of Red > Hat Cluster Suite. I take it RHCS itself is not open, Wrong, all the source code for RHCS is open. http://sources.redhat.com/cluster/ -- patrick From jstoner at opsource.net Wed Jun 28 14:29:07 2006 From: jstoner at opsource.net (Jeff Stoner) Date: Wed, 28 Jun 2006 15:29:07 +0100 Subject: [Linux-cluster] Equivalent to RHCS? Message-ID: <38A48FA2F0103444906AD22E14F1B5A303AFA6B4@mailxchg01.corp.opsource.net> It's all Open Source. Redhat is only obligated to make the source code available (whether that's through CVS, tarballs or SRPMs.) Purchasing a license entitles you to get support from Redhat (via phone, web or email) and allows you to download binary RPMs and get compiled updates via up2date from their servers. Without a license, you have to build the software yourself or get RPMs from some place like rpmfind.net, and support is pretty much limited to mailing lists and message forums (where you may not get an answer.) --Jeff SME - UNIX OpSource Inc. PGP Key ID 0x6CB364CA I hate to bother you guys with dumb questions, but I'm confused-- Looking at the components, it looks like this project is the core of Red Hat Cluster Suite. I take it RHCS itself is not open, but its "root" components are? How does RHCS compare to these open projects? Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From riaan at obsidian.co.za Wed Jun 28 14:41:03 2006 From: riaan at obsidian.co.za (Riaan van Niekerk) Date: Wed, 28 Jun 2006 16:41:03 +0200 (SAST) Subject: [Linux-cluster] Equivalent to RHCS? In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A303AFA6B4@mailxchg01.corp.opsource.net> Message-ID: Jeff you are mostly correct, but I want to make a slight (but IMHO important correction). What you pay for is not a licence or right to use but a "subscription", which consists of access to binaries (including updates and upgrades for the duration of your subscription), support, certification and open source assurance. or s/licence/subscription/ and I second your answer. Riaan On Wed, 28 Jun 2006, Jeff Stoner wrote: > It's all Open Source. Redhat is only obligated to make the source code > available (whether that's through CVS, tarballs or SRPMs.) Purchasing a > license entitles you to get support from Redhat (via phone, web or > email) and allows you to download binary RPMs and get compiled updates > via up2date from their servers. > > Without a license, you have to build the software yourself or get RPMs > from some place like rpmfind.net, and support is pretty much limited to > mailing lists and message forums (where you may not get an answer.) > > > --Jeff > SME - UNIX > OpSource Inc. > > PGP Key ID 0x6CB364CA > > > > > > > I hate to bother you guys with dumb questions, but I'm > confused-- > > > > Looking at the components, it looks like this project is the > core of Red Hat Cluster Suite. I take it RHCS itself is not open, but > its "root" components are? How does RHCS compare to these open > projects? > > > > Thanks in advance. From frank at opticalart.de Wed Jun 28 14:51:42 2006 From: frank at opticalart.de (Frank Hellmann) Date: Wed, 28 Jun 2006 16:51:42 +0200 Subject: [Linux-cluster] Equivalent to RHCS? In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A303AFA6B4@mailxchg01.corp.opsource.net> References: <38A48FA2F0103444906AD22E14F1B5A303AFA6B4@mailxchg01.corp.opsource.net> Message-ID: <44A2977E.2060005@opticalart.de> Hi! If you look at projects like CentOS ( http://www.centos.org/ ) you will find a complete free RedHat clone including the Cluster Suite as precompiled packages. If you don't need any assistance and can live without extra services this might be another route to go, instead of compiling everything yourself. Cheers, Frank... PS: I don't want to argue about the free/freedom/moral issues in using a clone distribution. There's a thread at CentOS covering that to a good extent: http://www.centos.org/modules/newbb/viewtopic.php?topic_id=3642&forum=23 Jeff Stoner wrote: > It's all Open Source. Redhat is only obligated to make the source code > available (whether that's through CVS, tarballs or SRPMs.) Purchasing > a license entitles you to get support from Redhat (via phone, web or > email) and allows you to download binary RPMs and get compiled updates > via up2date from their servers. > Without a license, you have to build the software yourself or get RPMs > from some place like rpmfind.net, and support is pretty much limited > to mailing lists and message forums (where you may not get an answer.) > > --Jeff > SME - UNIX > OpSource Inc. > > PGP Key ID 0x6CB364CA > > > I hate to bother you guys with dumb questions, but I?m confused-- > > Looking at the components, it looks like this project is the core > of Red Hat Cluster Suite. I take it RHCS itself is not open, but > its ?root? components are? How does RHCS compare to these open > projects? > > Thanks in advance. > >------------------------------------------------------------------------ > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > -- -------------------------------------------------------------------------- Frank Hellmann Optical Art GmbH Waterloohain 7a DI Supervisor http://www.opticalart.de 22769 Hamburg frank at opticalart.de Tel: ++49 40 5111051 Fax: ++49 40 43169199 From DERRICK.BEERY at iowa.gov Wed Jun 28 16:16:30 2006 From: DERRICK.BEERY at iowa.gov (Beery, Derrick [DAS]) Date: Wed, 28 Jun 2006 11:16:30 -0500 Subject: [Linux-cluster] cluster-1.02.00 make errors Message-ID: <4D9680752635E9448FF261A1443DD293033C14C0@iowadsmex04.iowa.gov.state.ia.us> Make against 2.6.16 vanilla kernel is failing with the following errors: CC [M] /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function `user_eo_get': /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few arguments to function `permission' /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function `user_eo_set': /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few arguments to function `permission' /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function `user_eo_remove': /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too few arguments to function `permission' make[5]: *** [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1 make[4]: *** [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2 Can anyone shed some light on this? Thanks in advance! Derrick Beery DAS ITE State of Iowa -------------- next part -------------- An HTML attachment was scrubbed... URL: From ranjtech at gmail.com Wed Jun 28 17:38:14 2006 From: ranjtech at gmail.com (RR) Date: Thu, 29 Jun 2006 03:38:14 +1000 Subject: [Linux-cluster] two node cluster not coming up Message-ID: <000001c69ad9$a462d8c0$ed288a40$@com> Hello all, Just installed a 2-node cluster 15 mins ago, by the book, using bob peterson's cookbook and the official CS guide, but only one of my nodes comes up and joins the cluster, the other one stays in the "joining" state with the message in "/var/log/messages" file stating cman: cman_tool: Node is already active failed it had tried pretty hard to join the cluster when I was first bringing it up and even says "Connected to cluster infrastructure via: CMAN/SM Plugin v1.1.5" Initial status:: Inquorate Remote copy of cluster.conf is from quorate node. Local Version # : 3 Remote version #: 3 Note, I don't have the "fenced" running yet. Also I had specified in my cluster.conf file thusly What am I doing wrong? Thx \R From rohara at redhat.com Wed Jun 28 18:44:48 2006 From: rohara at redhat.com (Ryan O'Hara) Date: Wed, 28 Jun 2006 13:44:48 -0500 Subject: [Linux-cluster] cluster-1.02.00 make errors In-Reply-To: <4D9680752635E9448FF261A1443DD293033C14C0@iowadsmex04.iowa.gov.state.ia.us> References: <4D9680752635E9448FF261A1443DD293033C14C0@iowadsmex04.iowa.gov.state.ia.us> Message-ID: <44A2CE20.8040003@redhat.com> Derrick, I was able to build the cluster-1.02.00 code against the vanilla 2.6.16 kernel. Did you run the configure script with the --kernel_src option to point to the correct kernel tree? ./configure --kernel_src=/path/to/kernel Ryan Beery, Derrick [DAS] wrote: > Make against 2.6.16 vanilla kernel is failing with the following errors: > > > > CC [M] /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o > > /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function > `user_eo_get': > > /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few > arguments to function `permission' > > /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function > `user_eo_set': > > /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few > arguments to function `permission' > > /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In function > `user_eo_remove': > > /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too few > arguments to function `permission' > > make[5]: *** [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] > Error 1 > > make[4]: *** [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] > Error 2 > > > > Can anyone shed some light on this? > > > Thanks in advance! > > > > Derrick Beery > > DAS ITE State of Iowa > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Wed Jun 28 20:29:57 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Wed, 28 Jun 2006 15:29:57 -0500 Subject: [Linux-cluster] two node cluster not coming up In-Reply-To: <000001c69ad9$a462d8c0$ed288a40$@com> References: <000001c69ad9$a462d8c0$ed288a40$@com> Message-ID: <44A2E6C5.5080802@redhat.com> RR wrote: > Hello all, > > Just installed a 2-node cluster 15 mins ago, by the book, using bob > peterson's cookbook and the official CS guide, but only one of my nodes > comes up and joins the cluster, the other one stays in the "joining" state > with the message in "/var/log/messages" file stating > > cman: cman_tool: Node is already active failed > > it had tried pretty hard to join the cluster when I was first bringing it up > and even says > > "Connected to cluster infrastructure via: CMAN/SM Plugin v1.1.5" > Initial status:: Inquorate > Remote copy of cluster.conf is from quorate node. > Local Version # : 3 > Remote version #: 3 > > Note, I don't have the "fenced" running yet. Also I had specified in my > cluster.conf file thusly > > > > > What am I doing wrong? > > Thx > \R > Hi RR, You didn't give us much to go on. When looking into these kinds of problems, it's always nice to see the /etc/cluster/cluster.conf file, and possibly the output of clustat, and cman_tool status from the cluster nodes, in this case, from the node that seemed to work. Regards, Bob Peterson Red Hat Cluster Suite From ranjtech at gmail.com Thu Jun 29 04:55:33 2006 From: ranjtech at gmail.com (RR) Date: Thu, 29 Jun 2006 14:55:33 +1000 Subject: [Linux-cluster] two node cluster not coming up In-Reply-To: <44A2E6C5.5080802@redhat.com> References: <000001c69ad9$a462d8c0$ed288a40$@com> <44A2E6C5.5080802@redhat.com> Message-ID: <000f01c69b38$42d95110$c88bf330$@com> Hi Bob, Attached is the cluster.conf file, and below is the status I get from the command "service cman status" on the working node: Svr00# service cman status Protocol version: 5.0.1 Config version: 4 Cluster name: testcluster Cluster ID: 27453 Cluster Member: Yes Membership state: Cluster-Member Nodes: 1 Expected_votes: 1 Total_votes: 1 Quorum: 1 Active subsystems: 0 Node name: svr00 Node addresses: 10.1.3.64 svr00# clustat Member Status: Quorate Resource Group Manager not running; no service information available. Member Name Status ------ ---- ------ svr00 Online, Local svr01 Offline I rebooted svr01 and now it just sits there at Starting clvmd: during bootup. Hope this helps in anyone understanding my issue? Do I need all the other services configured for this to work properly? i.e. clvmd, fenced, etc. etc. I just wanted to see two nodes in a cluster first before I configured any resources, services, fencing etc etc. \R -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson Hi RR, You didn't give us much to go on. When looking into these kinds of problems, it's always nice to see the /etc/cluster/cluster.conf file, and possibly the output of clustat, and cman_tool status from the cluster nodes, in this case, from the node that seemed to work. Regards, Bob Peterson Red Hat Cluster Suite -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.1.conf Type: application/octet-stream Size: 1549 bytes Desc: not available URL: From frank at opticalart.de Thu Jun 29 07:17:40 2006 From: frank at opticalart.de (Frank Hellmann) Date: Thu, 29 Jun 2006 09:17:40 +0200 Subject: [Linux-cluster] cluster-1.02.00 make errors In-Reply-To: <44A2CE20.8040003@redhat.com> References: <4D9680752635E9448FF261A1443DD293033C14C0@iowadsmex04.iowa.gov.state.ia.us> <44A2CE20.8040003@redhat.com> Message-ID: <44A37E94.8070103@opticalart.de> Hi Derrick, Make sure you build it this way: $ ./configure --kernel_src=/path/to/linux-2.6.x $ make install the usual make; make install won't work for various reasons. Just do a make install. Cheers, Frank... Ryan O'Hara wrote: > > Derrick, > > I was able to build the cluster-1.02.00 code against the vanilla > 2.6.16 kernel. Did you run the configure script with the --kernel_src > option to point to the correct kernel tree? > > ./configure --kernel_src=/path/to/kernel > > Ryan > > > > Beery, Derrick [DAS] wrote: > >> Make against 2.6.16 vanilla kernel is failing with the following errors: >> >> >> >> CC [M] /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_get': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few >> arguments to function `permission' >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_set': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few >> arguments to function `permission' >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_remove': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too >> few arguments to function `permission' >> >> make[5]: *** >> [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1 >> >> make[4]: *** >> [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2 >> >> >> >> Can anyone shed some light on this? >> >> >> Thanks in advance! >> >> >> >> Derrick Beery >> >> DAS ITE State of Iowa >> >> >> ------------------------------------------------------------------------ >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- -------------------------------------------------------------------------- Frank Hellmann Optical Art GmbH Waterloohain 7a DI Supervisor http://www.opticalart.de 22769 Hamburg frank at opticalart.de Tel: ++49 40 5111051 Fax: ++49 40 43169199 From l.dardini at comune.prato.it Thu Jun 29 07:27:38 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Thu, 29 Jun 2006 09:27:38 +0200 Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and hungcluster In-Reply-To: <44A13427.7080709@aurigainformatica.it> Message-ID: <0C5C8B118420264EBB94D7D7050150012A0083@exchange2.comune.prato.local> > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di > Fabrizio Lippolis > Inviato: marted? 27 giugno 2006 15.36 > A: linux clustering > Oggetto: Re: [Linux-cluster] "Missed too many heartbeats" > messages and hungcluster > > Patrick Caulfield ha scritto: > > >> Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node > AICLSRV01 from > >> the cluster : Missed too many heartbeats > > > > > > That message means that the heartbeat messages are getting > lost somehow. > > either through an unreliable network link or something else odd > > happening on the machine to prevent the heartbeat packets > reaching the network. > > This is very strange since the two machines are connected by > a gigabit crossover cable and no other device is in the > middle. Also, no firewall rules are configured on any machine. > > By the way, actually I am using the fence manual method but > it isn't much helpful and I would like to switch to a method > that ensures a reliable service. Does it mean I have to buy a > device sitting in the middle of the machines that connects > network and power cables? I am rather new to it so please any > suggestion is welcome. > A fencing device is required for granting consistency of write. If one node fails to comunicate with other devices, it can write in an unconditional mode and bye bye to GFS. A fencing device is not only a power-fence device. In my case it is the fibre channel switch. When a node has to be fenced, other telnet to the fibre channel switch and turn off the port. This doesn't powercycle the device, but blocks the write on the shared device. What kind of shared device are you using? Leandro > -- > Fabrizio Lippolis > fabrizio.lippolis at aurigainformatica.it > Auriga Informatica s.r.l. Via Don Guanella 15/B - > 70124 Bari > Tel.: 080/5025414 - Fax: 080/5027448 - > http://www.aurigainformatica.it/ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From Fabrizio.Lippolis at AurigaInformatica.it Thu Jun 29 07:47:02 2006 From: Fabrizio.Lippolis at AurigaInformatica.it (Fabrizio Lippolis) Date: Thu, 29 Jun 2006 09:47:02 +0200 Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and hungcluster In-Reply-To: <0C5C8B118420264EBB94D7D7050150012A0083@exchange2.comune.prato.local> References: <0C5C8B118420264EBB94D7D7050150012A0083@exchange2.comune.prato.local> Message-ID: <44A38576.4030709@aurigainformatica.it> Leandro Dardini ha scritto: > A fencing device is required for granting consistency of write. If one node fails to comunicate with other devices, it can write in an unconditional mode and bye bye to GFS. > A fencing device is not only a power-fence device. In my case it is the fibre channel switch. When a node has to be fenced, other telnet to the fibre channel switch and turn off the port. This doesn't powercycle the device, but blocks the write on the shared device. What kind of shared device are you using? It's a GFS file system on a disk array. Since I built the cluster for MySQL and ldap services, it's the file system where actually are the database and directory files. The disk array is physically connected to both machines by a SCSI cable. -- Fabrizio Lippolis fabrizio.lippolis at aurigainformatica.it Auriga Informatica s.r.l. Via Don Guanella 15/B - 70124 Bari Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/ From ugo.parsi at gmail.com Thu Jun 29 10:05:03 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Thu, 29 Jun 2006 12:05:03 +0200 Subject: [Linux-cluster] Using GULM with CLVM Message-ID: Hello, I am having way too much problems with CMAN/DLM on a midsize cluster (random kernel panics, freezes, random quorum dropping issues, cluster split view, etc..) and just saw that it was more preferable to use GULM for mid to large sized clusters and that CMAN wasn't tested on more than 32 nodes. Actually, I am using the RedHat Cluster Suite only for CLVM, and some people are saying that CLVM+GULM is supported but as I can see on the official 'documentation' : http://sourceware.org/cluster/gulm/gulmusage.txt : 'This document does not cover setting up a block device to run on. Mostly because CLVM doesn't work with gulm yet' What's the current position on that please ? Is the documentation outdated ? Also, my whole infrastructure is totally virtualized (with Xen), and it is also said that it's better to use GULM on a dedicated computer. Anyone tried that on a dedicated virtual machine ? 128 or 256 Megs of RAM should be enough or GULM is ressource hungry ? Thanks, Ugo PARSI -- An apple a day, keeps the doctor away From pcaulfie at redhat.com Thu Jun 29 10:12:32 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 29 Jun 2006 11:12:32 +0100 Subject: [Linux-cluster] Using GULM with CLVM In-Reply-To: References: Message-ID: <44A3A790.8090206@redhat.com> Ugo PARSI wrote: > Hello, > > I am having way too much problems with CMAN/DLM on a midsize cluster > (random kernel panics, freezes, random quorum dropping issues, cluster > split view, etc..) and just saw that it was more preferable to use > GULM for mid to large sized clusters and that CMAN wasn't tested on > more than 32 nodes. > > Actually, I am using the RedHat Cluster Suite only for CLVM, and some > people are saying that CLVM+GULM is supported but as I can see on the > official 'documentation' : > > http://sourceware.org/cluster/gulm/gulmusage.txt : > > 'This document does not cover setting up a block device to run on. > Mostly because CLVM doesn't work with gulm yet' > > What's the current position on that please ? > Is the documentation outdated ? Yes, it is outdated. clvmd does work with gulm. > Also, my whole infrastructure is totally virtualized (with Xen), and > it is also said that it's better to use GULM on a dedicated computer. > Anyone tried that on a dedicated virtual machine ? > 128 or 256 Megs of RAM should be enough or GULM is ressource hungry ? > gulm is resource hungry. Get as much RAM as you can ;-) I'm no expert on gulm, but I would expect that 256MB would not be enough for a cluster of over 32 nodes. -- patrick From ugo.parsi at gmail.com Thu Jun 29 10:39:04 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Thu, 29 Jun 2006 12:39:04 +0200 Subject: [Linux-cluster] Using GULM with CLVM In-Reply-To: <44A3A790.8090206@redhat.com> References: <44A3A790.8090206@redhat.com> Message-ID: Ok thanks for your quick answer :) > > Yes, it is outdated. clvmd does work with gulm. > Ok... since this is undocumented. Are these steps ok ? : -> Start gulm servers. -> Update cluster.conf to remove cman and add gulm servers -> Remove cman from the node startup scripts -> Reboot the whole cluster. (I'm not in production yet, so downtime is not a real matter, and I'm trying to deal the transition the easiest way) Nothing has to be changed for LVM / CLVM ? I start / use them the same way ? > > gulm is resource hungry. Get as much RAM as you can ;-) > I'm no expert on gulm, but I would expect that 256MB would not be enough for a > cluster of over 32 nodes. > Ouch ! But gulm is just a central locking server, right ? :) I was more thinking of something like 5 or 10 megs max, LOL :) Thanks, Ugo PARSI -- An apple a day, keeps the doctor away From cjk at techma.com Thu Jun 29 11:25:56 2006 From: cjk at techma.com (Kovacs, Corey J.) Date: Thu, 29 Jun 2006 07:25:56 -0400 Subject: =?us-ascii?Q?RE:_=5BLinux-cluster=5D_two_node_cluster_not_coming_up?= In-Reply-To: <000f01c69b38$42d95110$c88bf330$@com> Message-ID: Just a thought, this sounds like what happens when the /etc/hosts file is not setup correctly. If the hostname of the machines is in the loopback line, then take it out and put a proper entry in. I still fail to understand why the installer doesn't add a proper entry when first installed if a network interface is indeed configured. That's a nother issue tho. Hope this helps... Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of RR Sent: Thursday, June 29, 2006 12:56 AM To: 'linux clustering' Subject: RE: [Linux-cluster] two node cluster not coming up Hi Bob, Attached is the cluster.conf file, and below is the status I get from the command "service cman status" on the working node: Svr00# service cman status Protocol version: 5.0.1 Config version: 4 Cluster name: testcluster Cluster ID: 27453 Cluster Member: Yes Membership state: Cluster-Member Nodes: 1 Expected_votes: 1 Total_votes: 1 Quorum: 1 Active subsystems: 0 Node name: svr00 Node addresses: 10.1.3.64 svr00# clustat Member Status: Quorate Resource Group Manager not running; no service information available. Member Name Status ------ ---- ------ svr00 Online, Local svr01 Offline I rebooted svr01 and now it just sits there at Starting clvmd: during bootup. Hope this helps in anyone understanding my issue? Do I need all the other services configured for this to work properly? i.e. clvmd, fenced, etc. etc. I just wanted to see two nodes in a cluster first before I configured any resources, services, fencing etc etc. \R -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson Hi RR, You didn't give us much to go on. When looking into these kinds of problems, it's always nice to see the /etc/cluster/cluster.conf file, and possibly the output of clustat, and cman_tool status from the cluster nodes, in this case, from the node that seemed to work. Regards, Bob Peterson Red Hat Cluster Suite From pcaulfie at redhat.com Thu Jun 29 12:02:54 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 29 Jun 2006 13:02:54 +0100 Subject: [Linux-cluster] Using GULM with CLVM In-Reply-To: References: <44A3A790.8090206@redhat.com> Message-ID: <44A3C16E.6050603@redhat.com> Ugo PARSI wrote: > Ok thanks for your quick answer :) > >> >> Yes, it is outdated. clvmd does work with gulm. >> > > Ok... since this is undocumented. > > Are these steps ok ? : > > -> Start gulm servers. > -> Update cluster.conf to remove cman and add gulm servers > -> Remove cman from the node startup scripts > -> Reboot the whole cluster. > > (I'm not in production yet, so downtime is not a real matter, and I'm > trying to deal the transition the easiest way) > > Nothing has to be changed for LVM / CLVM ? > I start / use them the same way ? That's right. clvmd will detect that it's running with gulm rather than cman. >> >> gulm is resource hungry. Get as much RAM as you can ;-) >> I'm no expert on gulm, but I would expect that 256MB would not be >> enough for a >> cluster of over 32 nodes. >> > > Ouch ! > But gulm is just a central locking server, right ? :) > I was more thinking of something like 5 or 10 megs max, LOL :) Well, it depends on how many locks there are obviously. but GFS caches locks for speed so you can end up with quite a lot! -- patrick From ugo.parsi at gmail.com Thu Jun 29 13:12:59 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Thu, 29 Jun 2006 15:12:59 +0200 Subject: [Linux-cluster] Using GULM with CLVM In-Reply-To: <44A3C16E.6050603@redhat.com> References: <44A3A790.8090206@redhat.com> <44A3C16E.6050603@redhat.com> Message-ID: > > > > Nothing has to be changed for LVM / CLVM ? > > I start / use them the same way ? > > That's right. clvmd will detect that it's running with gulm rather than cman. > I've tried but can't figure on how to make it work. Do you know any logs that I could check ? ccsd starts fine. gulmd takes like 1 or 2 secs and seems ok. then when I start clvmd it takes a big time (like 60 seconds). then lvm is stuck/zombie or dies whatever the action I make (not the cannot find socket error) venus:~# ps aux | grep lvm root 2987 0.0 0.9 21072 1256 ? Ss 21:31 0:00 clvmd root 2991 0.0 0.0 0 0 ? Z 21:31 0:00 [lvm] > Well, it depends on how many locks there are obviously. but GFS caches locks > for speed so you can end up with quite a lot! > -- > Okay, but I'm not planning on using GFS anyway at the moment, I'm only using the CLVM part of the cluster package. Thanks, Ugo PARSI -- An apple a day, keeps the doctor away From l.dardini at comune.prato.it Thu Jun 29 13:38:24 2006 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Thu, 29 Jun 2006 15:38:24 +0200 Subject: R: R: [Linux-cluster] "Missed too many heartbeats" messages andhungcluster In-Reply-To: <44A38576.4030709@aurigainformatica.it> Message-ID: <0C5C8B118420264EBB94D7D7050150012A00A8@exchange2.comune.prato.local> > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di > Fabrizio Lippolis > Inviato: gioved? 29 giugno 2006 9.47 > A: linux clustering > Oggetto: Re: R: [Linux-cluster] "Missed too many heartbeats" > messages andhungcluster > > Leandro Dardini ha scritto: > > > A fencing device is required for granting consistency of > write. If one node fails to comunicate with other devices, it > can write in an unconditional mode and bye bye to GFS. > > A fencing device is not only a power-fence device. In my > case it is the fibre channel switch. When a node has to be > fenced, other telnet to the fibre channel switch and turn off > the port. This doesn't powercycle the device, but blocks the > write on the shared device. What kind of shared device are you using? > > It's a GFS file system on a disk array. Since I built the > cluster for MySQL and ldap services, it's the file system > where actually are the database and directory files. The disk > array is physically connected to both machines by a SCSI cable. > Is there a managemente console accessible via telnet/http where you can "disable" a port/host? If this is the case, you have already a fencing device. Leandro > -- > Fabrizio Lippolis > fabrizio.lippolis at aurigainformatica.it > Auriga Informatica s.r.l. Via Don Guanella 15/B - > 70124 Bari > Tel.: 080/5025414 - Fax: 080/5027448 - > http://www.aurigainformatica.it/ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From ramon at vanalteren.nl Thu Jun 29 13:40:48 2006 From: ramon at vanalteren.nl (Ramon van Alteren) Date: Thu, 29 Jun 2006 15:40:48 +0200 Subject: [Linux-cluster] Re: CLVM and AoE Message-ID: <44A3D860.9010802@vanalteren.nl> Hi Aaron, > We're already committed to the AoE route unfortunately, but we're > setting up next week, and I'll keep everyone posted on any performance > benchmarks we glean. I'm working on a similar setup (no Xen do use GFS & coRAIDS) I was wondering whether you have any performance info about this setup and if you fell into any pitfalls along the way. TIA, Ramon From ranjtech at gmail.com Thu Jun 29 13:46:07 2006 From: ranjtech at gmail.com (RR) Date: Thu, 29 Jun 2006 23:46:07 +1000 Subject: [Linux-cluster] two node cluster not coming up In-Reply-To: References: <000f01c69b38$42d95110$c88bf330$@com> Message-ID: <000801c69b82$615e1f90$241a5eb0$@com> No, that ain't it. I install CSGFS etc. during my modified Kickstart process and as part of my extended post-install, I fix the /etc/hosts as well and I double checked that and its all good. Anyone else? Bob? Any ideas? -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kovacs, Corey J. Sent: Thursday, June 29, 2006 9:26 PM To: linux clustering Subject: RE: [Linux-cluster] two node cluster not coming up Just a thought, this sounds like what happens when the /etc/hosts file is not setup correctly. If the hostname of the machines is in the loopback line, then take it out and put a proper entry in. I still fail to understand why the installer doesn't add a proper entry when first installed if a network interface is indeed configured. That's a nother issue tho. Hope this helps... Corey -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of RR Sent: Thursday, June 29, 2006 12:56 AM To: 'linux clustering' Subject: RE: [Linux-cluster] two node cluster not coming up Hi Bob, Attached is the cluster.conf file, and below is the status I get from the command "service cman status" on the working node: Svr00# service cman status Protocol version: 5.0.1 Config version: 4 Cluster name: testcluster Cluster ID: 27453 Cluster Member: Yes Membership state: Cluster-Member Nodes: 1 Expected_votes: 1 Total_votes: 1 Quorum: 1 Active subsystems: 0 Node name: svr00 Node addresses: 10.1.3.64 svr00# clustat Member Status: Quorate Resource Group Manager not running; no service information available. Member Name Status ------ ---- ------ svr00 Online, Local svr01 Offline I rebooted svr01 and now it just sits there at Starting clvmd: during bootup. Hope this helps in anyone understanding my issue? Do I need all the other services configured for this to work properly? i.e. clvmd, fenced, etc. etc. I just wanted to see two nodes in a cluster first before I configured any resources, services, fencing etc etc. \R -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson Hi RR, You didn't give us much to go on. When looking into these kinds of problems, it's always nice to see the /etc/cluster/cluster.conf file, and possibly the output of clustat, and cman_tool status from the cluster nodes, in this case, from the node that seemed to work. Regards, Bob Peterson Red Hat Cluster Suite -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From kanderso at redhat.com Thu Jun 29 14:00:04 2006 From: kanderso at redhat.com (Kevin Anderson) Date: Thu, 29 Jun 2006 09:00:04 -0500 Subject: R: [Linux-cluster] "Missed too many heartbeats" messages and hungcluster In-Reply-To: <44A38576.4030709@aurigainformatica.it> References: <0C5C8B118420264EBB94D7D7050150012A0083@exchange2.comune.prato.local> <44A38576.4030709@aurigainformatica.it> Message-ID: <1151589604.2864.4.camel@localhost.localdomain> On Thu, 2006-06-29 at 09:47 +0200, Fabrizio Lippolis wrote: > Leandro Dardini ha scritto: > > > A fencing device is required for granting consistency of write. If one node fails to comunicate with other devices, it can write in an unconditional mode and bye bye to GFS. > > A fencing device is not only a power-fence device. In my case it is the fibre channel switch. When a node has to be fenced, other telnet to the fibre channel switch and turn off the port. This doesn't powercycle the device, but blocks the write on the shared device. What kind of shared device are you using? > > It's a GFS file system on a disk array. Since I built the cluster for > MySQL and ldap services, it's the file system where actually are the > database and directory files. The disk array is physically connected to > both machines by a SCSI cable. > You might be getting lockouts due to the storage subsystem you are using. GFS requires the ability to write/read concurrently from the storage devices and generally overwhelms a direct attached SCSI array. The configuration you describe will not be stable since when one node is accessing the storage, the other machine is completely locked out of the bus. This is probably some of the problems you are having with missing heartbeats. It has been a long time since we have run in that configuration, so not sure of the current behaviors, use fibre channel, iscsi or gnbd as proper storage infrastructure. Kevin From pcaulfie at redhat.com Thu Jun 29 14:03:00 2006 From: pcaulfie at redhat.com (Patrick Caulfield) Date: Thu, 29 Jun 2006 15:03:00 +0100 Subject: [Linux-cluster] Using GULM with CLVM In-Reply-To: References: <44A3A790.8090206@redhat.com> <44A3C16E.6050603@redhat.com> Message-ID: <44A3DD94.3070001@redhat.com> Ugo PARSI wrote: >> > >> > Nothing has to be changed for LVM / CLVM ? >> > I start / use them the same way ? >> >> That's right. clvmd will detect that it's running with gulm rather >> than cman. >> > > I've tried but can't figure on how to make it work. > Do you know any logs that I could check ? > > ccsd starts fine. > gulmd takes like 1 or 2 secs and seems ok. > then when I start clvmd it takes a big time (like 60 seconds). > then lvm is stuck/zombie or dies whatever the action I make (not the > cannot find socket error) > clvmd should log errors to syslog. Check that the gulm cluster is quorate as clvmd won't do anything without a quorate cluster. You might also like to run it with -d and see if any errors appear on stderr. -- patrick From rpeterso at redhat.com Thu Jun 29 14:36:28 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Thu, 29 Jun 2006 09:36:28 -0500 Subject: [Linux-cluster] two node cluster not coming up In-Reply-To: <000f01c69b38$42d95110$c88bf330$@com> References: <000001c69ad9$a462d8c0$ed288a40$@com> <44A2E6C5.5080802@redhat.com> <000f01c69b38$42d95110$c88bf330$@com> Message-ID: <44A3E56C.6010001@redhat.com> RR wrote: > Hi Bob, > > Attached is the cluster.conf file, and below is the status I get from the > command "service cman status" on the working node: > > Svr00# service cman status > Protocol version: 5.0.1 > Config version: 4 > Cluster name: testcluster > Cluster ID: 27453 > Cluster Member: Yes > Membership state: Cluster-Member > Nodes: 1 > Expected_votes: 1 > Total_votes: 1 > Quorum: 1 > Active subsystems: 0 > Node name: svr00 > Node addresses: 10.1.3.64 > > svr00# clustat > Member Status: Quorate > > Resource Group Manager not running; no service information available. > > Member Name Status > ------ ---- ------ > svr00 Online, Local > svr01 Offline > > > I rebooted svr01 and now it just sits there at Starting clvmd: during > bootup. > > Hope this helps in anyone understanding my issue? Do I need all the other > services configured for this to work properly? i.e. clvmd, fenced, etc. etc. > I just wanted to see two nodes in a cluster first before I configured any > resources, services, fencing etc etc. > > \R > Hi RR, Hm. I didn't see anything obviously wrong with your cluster.conf file. I guess I'd reboot svr01 and try to bring it into the cluster manually, and see if it complains about anything along the way. (You may need to bring it up in single-user mode so that it doesn't hang at the service script that starts clvmd) Something like this: modprobe lock_dlm modprobe gfs ccsd cman_tool join -w fence_tool join -w clvmd I'd verify that your communications are sound, that you can ping svr00 from svr01, and that multicast is working. Any reason you went with multicast rather than broadcast? You could see if a broadcast ping (ping -b) would work from svr01 to svr00. Also, you could test to see if your firewall is blocking the IO by temporarily doing "service iptables stop" on both nodes. I'd hope that selinux isn't interfering either, but you could try doing "setenforce 0" just as an experiment to make sure. These are just some ideas. Regards, Bob Peterson Red Hat Cluster Suite From ranjtech at gmail.com Thu Jun 29 14:44:49 2006 From: ranjtech at gmail.com (RR) Date: Fri, 30 Jun 2006 00:44:49 +1000 Subject: [Linux-cluster] two node cluster not coming up In-Reply-To: <44A3E56C.6010001@redhat.com> References: <000001c69ad9$a462d8c0$ed288a40$@com> <44A2E6C5.5080802@redhat.com> <000f01c69b38$42d95110$c88bf330$@com> <44A3E56C.6010001@redhat.com> Message-ID: <000901c69b8a$94f95c40$beec14c0$@com> Hi Bob, Yeah the communication is all good, can ping each other, in fact I'm scp'ing the cluster.conf file to svr01 and there's nothing else on that network, they might as well be connected through a x-over cable as these two machines are the only machines on the network. I got around the startup hang by starting the OS with the "I" keypress and said No to all cluster services. Once in the OS, I did # service ccsd start # service cman start This works fine on svr00, on svr01 it comes back with [FAILED]. When I do, 'service cman status' it says the following [root at svr01 ~]# service cman status Protocol version: 5.0.1 Config version: 4 Cluster name: testcluster Cluster ID: 27453 Cluster Member: No Membership state: Joining When I do a manual: cman_tool join -w I get back a "cman_tool: Node is already active" Also, I load my modules automatically during system startup with S99local. BTW, I do now see the following messages in my /var/log/messages file Jun 29 14:36:39 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign requested address Jun 29 14:36:40 svr01 kernel: CMAN: sending membership request Jun 29 14:36:41 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign requested address Jun 29 14:36:45 svr01 last message repeated 2 times Jun 29 14:36:45 svr01 kernel: CMAN: sending membership request Jun 29 14:36:47 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign requested address Jun 29 14:36:50 svr01 kernel: CMAN: sending membership request Jun 29 14:37:25 svr01 last message repeated 7 times Does this help? Can I have the application generate more detailed logging? Thanks in advance \R -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson Sent: Friday, June 30, 2006 12:36 AM To: linux clustering Subject: Re: [Linux-cluster] two node cluster not coming up RR wrote: > Hi Bob, > > Attached is the cluster.conf file, and below is the status I get from > the command "service cman status" on the working node: > > Svr00# service cman status > Protocol version: 5.0.1 > Config version: 4 > Cluster name: testcluster > Cluster ID: 27453 > Cluster Member: Yes > Membership state: Cluster-Member > Nodes: 1 > Expected_votes: 1 > Total_votes: 1 > Quorum: 1 > Active subsystems: 0 > Node name: svr00 > Node addresses: 10.1.3.64 > > svr00# clustat > Member Status: Quorate > > Resource Group Manager not running; no service information available. > > Member Name Status > ------ ---- ------ > svr00 Online, Local > svr01 Offline > > > I rebooted svr01 and now it just sits there at Starting clvmd: during > bootup. > > Hope this helps in anyone understanding my issue? Do I need all the > other services configured for this to work properly? i.e. clvmd, fenced, etc. etc. > I just wanted to see two nodes in a cluster first before I configured > any resources, services, fencing etc etc. > > \R > Hi RR, Hm. I didn't see anything obviously wrong with your cluster.conf file. I guess I'd reboot svr01 and try to bring it into the cluster manually, and see if it complains about anything along the way. (You may need to bring it up in single-user mode so that it doesn't hang at the service script that starts clvmd) Something like this: modprobe lock_dlm modprobe gfs ccsd cman_tool join -w fence_tool join -w clvmd I'd verify that your communications are sound, that you can ping svr00 from svr01, and that multicast is working. Any reason you went with multicast rather than broadcast? You could see if a broadcast ping (ping -b) would work from svr01 to svr00. Also, you could test to see if your firewall is blocking the IO by temporarily doing "service iptables stop" on both nodes. I'd hope that selinux isn't interfering either, but you could try doing "setenforce 0" just as an experiment to make sure. These are just some ideas. Regards, Bob Peterson Red Hat Cluster Suite -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From bmarzins at redhat.com Thu Jun 29 14:59:24 2006 From: bmarzins at redhat.com (Benjamin Marzinski) Date: Thu, 29 Jun 2006 09:59:24 -0500 Subject: [Linux-cluster] GFS locking issues In-Reply-To: <433fd2630606221719q649fc46bv97e94a929c3d5427@mail.gmail.com> References: <433fd2630606141543y69cd7d50xfbcb6fdb347de48e@mail.gmail.com> <20060615190959.GB1913@redhat.com> <433fd2630606160837x1bbe2716pf7b375f42b01cdbd@mail.gmail.com> <20060621175430.GB4706@redhat.com> <433fd2630606221719q649fc46bv97e94a929c3d5427@mail.gmail.com> Message-ID: <20060629145924.GA15061@ether.msp.redhat.com> On Fri, Jun 23, 2006 at 03:19:52AM +0300, Anton Kornev wrote: Anton, It appears that you found a bug in the gnbd code. I have it on my todo list, and I'll get to it as soon as possible. Frankly, I'm really suprised that this hasn't come up earlier. The issue is that once a part of a request has been send to the server, either the whole request must be sent or connection must be dropped. In some cases, such as when the server is non-responsive, but still has an open socket, it is necessary to send a signal break out of the socket transfer and then shutdown the socket. In your case, you simply want the process that is in the middle of the transfer to die. In this case, the appropriate response is to finish sending the IO, and then pass the signal on. I need to look though the code and make sure that I'm doing the appropriate thing for all circumstances. -Ben > David, > > Thanks a lot for your comments. > Actually it sounds rather strange for me. > > I tried to grep the /var/log/messages log with "gnbd" word and found that > there are also > other messages like this even on the working host with no GFS problems. > > bash-3.00# grep gnbd /var/log/messages > Jun 19 08:16:20 node1 kernel: gnbd (pid 25756: alogc.pl) got signal 9 > Jun 19 08:16:20 node1 kernel: gnbd0: Send control failed (result -4) > Jun 19 08:16:20 node1 kernel: gnbd (pid 25756: alogc.pl) got signal 15 > Jun 19 08:16:20 node1 kernel: gnbd0: Send control failed (result -4) > Jun 19 09:51:59 node1 kernel: gnbd (pid 26259: find) got signal 9 > Jun 19 09:51:59 node1 kernel: gnbd0: Send control failed (result -4) > Jun 19 10:06:39 node1 kernel: gnbd (pid 313: alogc.pl) got signal 9 > Jun 19 10:06:39 node1 kernel: gnbd0: Send control failed (result -4) > Jun 19 10:06:39 node1 kernel: gnbd (pid 313: alogc.pl) got signal 15 > Jun 19 10:06:39 node1 kernel: gnbd0: Send control failed (result -4) > Jun 19 12:51:12 node1 kernel: gnbd (pid 19463: vi) got signal 1 > Jun 19 12:51:12 node1 kernel: gnbd0: Send control failed (result -4) > Jun 20 14:48:16 node1 kernel: gnbd (pid 20238: alogc.pl) got signal 9 > Jun 20 14:48:16 node1 kernel: gnbd0: Send control failed (result -4) > Jun 20 14:48:16 node1 kernel: gnbd (pid 20238: alogc.pl) got signal 15 > Jun 20 14:48:16 node1 kernel: gnbd0: Send control failed (result -4) > > I tried to check gnbd-kernel sources (latest available SRPM - not CVS > version) > and I found that the first message (gnbd ... got signal) is produced by > the > sock_xmit() function with the such a piece of code: > > if (signal_pending(current)) { > siginfo_t info; > spin_lock_irqsave(¤t->sighand->siglock, > flags); > printk(KERN_WARNING "gnbd (pid %d: %s) got signal > %d\n", > current->pid, current->comm, > dequeue_signal(current, ¤t->blocked, > &info)); > spin_unlock_irqrestore(¤t->sighand->siglock, > flags); > result = -EINTR; > break; > } > > And the second message is generated inside the gnbd_send_req() by the code > > result = sock_xmit(sock, 1, &request, sizeof(request), > (gnbd_cmd(req) == GNBD_CMD_WRITE)? MSG_MORE: 0); > if (result < 0) { > printk(KERN_ERR "%s: Send control failed (result %d)\n", > dev->disk->disk_name, result); > goto error_out; > } > > So at the first glance it seems like a normal messages from gnbd - if > there is signal received during sock_xmit - don't send anyting and return > -EINTR. > > I am not sure that it might be a problem but I take a look on the > sock_xmit() code and > there are at least two things that seems strange for me. > > 1. There is an inconsistancy between comment and code: > > /* Allow interception of SIGKILL only > * Don't allow other signals to interrupt the transmission */ > spin_lock_irqsave(¤t->sighand->siglock, flags); > oldset = current->blocked; > sigfillset(¤t->blocked); > sigdelsetmask(¤t->blocked, sigmask(SIGKILL) | > sigmask(SIGTERM) | > sigmask(SIGHUP)); > recalc_sigpending(); > spin_unlock_irqrestore(¤t->sighand->siglock, flags); > > So, inside the comment there is a suggestion that only SIGKILL can > interrupt the transmission but the real mask is for KILL/TERM/HUP signals > (btw: in my case it is a SIGTERM who locks everything). > > 2. There are two blocks of code following each other > > if (send) > result = sock_sendmsg(sock, &msg, size); > else > result = sock_recvmsg(sock, &msg, size, 0); > > if (signal_pending(current)) { > siginfo_t info; > spin_lock_irqsave(¤t->sighand->siglock, > flags); > printk(KERN_WARNING "gnbd (pid %d: %s) got signal > %d\n", > current->pid, current->comm, > dequeue_signal(current, ¤t->blocked, > &info)); > spin_unlock_irqrestore(¤t->sighand->siglock, > flags); > result = -EINTR; > break; > } > > Why do we need to return -EINTR as a result if we have already done the > real sock_sendmsg() / sock_recvmsg()? What if the real transmission was > okay and real result has no mistake? > > I am not a kernel developer and I haven't spent a lot of time on the > issue, so it might make no sense at all. > > Please, let me know what do you think about it? > > On 6/21/06, David Teigland <[1]teigland at redhat.com> wrote: > > On Fri, Jun 16, 2006 at 06:37:14PM +0300, Anton Kornev wrote: > > gnbd (pid 5836: alogc.pl) got signal 9 > > gnbd0: Send control failed (result -4) > > gnbd (pid 5836: alogc.pl) got signal 15 > > gnbd0: Send control failed (result -4) > > This and the fact that a number of processes appear to be blocked in the > i/o path seem to point at gnbd as the hold-up. > > Dave > > > 51 D wait_on_buffer pdflush > > 5771 D lock_page lock_dlm1 > > 5776 D - gfs_logd > > 5777 D - gfs_quotad > > 5778 D - gfs_inoded > > 5892 D - httpd > > 5895 D glock_wait_internal httpd > > 5896 D glock_wait_internal httpd > > 5897 D glock_wait_internal httpd > > 5911 D glock_wait_internal httpd > > 5915 D wait_on_buffer httpd > > 5930 D wait_on_buffer sh > > > pdflush D ffffffff8014aabc > 0 51 6 53 50 > > (L-TLB) > > 00000100dfc3dc78 0000000000000046 000001011bd3e980 000001010fc11f00 > > 0000000000000216 ffffffffa0042916 000001011aca60c0 > 0000000000000008 > > 000001011fdef7f0 0000000000000dfa > > Call Trace:{:dm_mod:dm_request+396} > > {keventd_create_kthread+0} > > {io_schedule+38} > > {__wait_on_buffer+125} > > {bh_wake_function+0} > > {bh_wake_function+0} > > {:gfs:gfs_logbh_wait+49} > > {:gfs:disk_commit+794} > > {:gfs:log_refund+111} > > {:gfs:log_flush_internal+510} > > {sync_supers+167} > {wb_kupdate+36} > > > > {pdflush+323} {wb_kupdate+0} > > {pdflush+0} {kthread+200} > > {child_rip+8} > > {keventd_create_kthread+0} > > {kthread+0} {child_rip+0} > > lock_dlm1 D 000001000c0096e0 > 0 5771 6 5772 5766 > > (L-TLB) > > 0000010113ce3c58 0000000000000046 0000001000000000 0000010000000069 > > 000001011420b030 0000000000000069 000001000c00a940 > 000000010000eb10 > > 000001011a887030 0000000000001cae > > Call Trace:{__generic_unplug_device+19} > > {io_schedule+38} > > {__lock_page+191} > > {page_wake_function+0} > > {page_wake_function+0} > > {truncate_inode_pages+519} > > {:gfs:gfs_inval_page+63} > > {:gfs:drop_bh+233} > > {:gfs:gfs_glock_cb+194} > > {:lock_dlm:dlm_async+1989} > > {default_wake_function+0} > > {keventd_create_kthread+0} > > {:lock_dlm:dlm_async+0} > > {keventd_create_kthread+0} > > {kthread+200} {child_rip+8} > > {keventd_create_kthread+0} > > {kthread+0} > > {child_rip+0} > > gfs_logd D 0000000000000000 > 0 5776 1 5777 5775 > > (L-TLB) > > 000001011387fe38 0000000000000046 0000000000000000 ffffffff80304a85 > > 000001011387fe58 ffffffff80304add ffffffff803cca80 > 0000000000000246 > > 00000101143fe030 00000000000000b5 > > Call Trace:{thread_return+0} > > {thread_return+88} > > {:gfs:lock_on_glock+112} > > {__down_write+134} > > {:gfs:gfs_ail_empty+56} > > {:gfs:gfs_logd+77} > > {child_rip+8} > > {dummy_d_instantiate+0} > > {:gfs:gfs_logd+0} > {child_rip+0} > > > > gfs_quotad D 0000000000000000 > 0 5777 1 5778 5776 > > (L-TLB) > > 0000010113881e98 0000000000000046 0000000000000000 ffffffff80304a85 > > 0000010113881eb8 ffffffff80304add 000001011ff87030 > 0000000100000074 > > 000001011430f7f0 0000000000000128 > > Call Trace:{thread_return+0} > > {thread_return+88} > > {__down_write+134} > > {:gfs:gfs_quota_sync+226} > > {:gfs:gfs_quotad+127} > > {child_rip+8} > > {dummy_d_instantiate+0} > > {dummy_d_instantiate+0} > > {dummy_d_instantiate+0} > > {:gfs:gfs_quotad+0} > > {child_rip+0} > > gfs_inoded D 0000000000000000 > 0 5778 1 5807 5777 > > (L-TLB) > > 0000010113883e98 0000000000000046 000001011e2937f0 000001000c0096e0 > > 0000000000000000 ffffffff80304a85 0000010113883ec8 > 0000000180304add > > 000001011e2937f0 00000000000000c2 > > Call Trace:{thread_return+0} > > {__down_write+134} > > {:gfs:unlinked_find+115} > > {:gfs:gfs_unlinked_dealloc+25} > > {:gfs:gfs_inoded+66} > > {child_rip+8} > > {:gfs:gfs_inoded+0} > {child_rip+0} > > > > > > httpd D ffffffff80304190 > 0 5892 1 5893 5826 > > (NOTLB) > > 0000010111b75bf8 0000000000000002 0000000000000001 0000000000000001 > > 0000000000000000 0000000000000000 0000010114667980 > 0000000111b75bc0 > > 00000101143fe7f0 00000000000009ad > > Call Trace:{__down+147} > > {default_wake_function+0} > > {generic_file_write_nolock+158} > > {__down_failed+53} > > {:gfs:.text.lock.dio+95} > > {:gfs:gfs_trans_add_bh+205} > > {:gfs:do_write_buf+1138} > > {:gfs:walk_vm+278} > > {:gfs:do_write_buf+0} > > {:gfs:do_write_buf+0} > > {:gfs:__gfs_write+201} > > {vfs_write+207} > > {sys_write+69} > {system_call+126} > > > > httpd D 0000010110ad7d48 0 5895 > 5892 5896 5893 > > (NOTLB) > > 0000010110ad7bd8 0000000000000006 000001011b16e030 0000000000000075 > > 0000010117002030 0000000000000075 000001000c002940 > 0000000000000001 > > 00000101170027f0 000000000001300e > > Call Trace:{try_to_wake_up+863} > > {wait_for_completion+167} > > {default_wake_function+0} > > {default_wake_function+0} > > {:gfs:glock_wait_internal+350} > > {:gfs:gfs_glock_nq+961} > > {:gfs:gfs_glock_nq_init+20} > > {:gfs:gfs_private_nopage+84} > > {do_no_page+1003} > > {do_wp_page+948} > > {handle_mm_fault+343} > > {get_signal_to_deliver+1118} > > {do_page_fault+518} > > {thread_return+0} > > {thread_return+88} > {error_exit+0} > > > > > > httpd D 0000010110b5bd48 0 5896 > 5892 5897 5895 > > (NOTLB) > > 0000010110b5bbd8 0000000000000002 00000101170027f0 0000000000000075 > > 00000101114787f0 0000000000000075 000001000c002940 > 0000000000000001 > > 0000010117002030 000000000000fb3e > > Call Trace:{try_to_wake_up+863} > > {wait_for_completion+167} > > {default_wake_function+0} > > {default_wake_function+0} > > {:gfs:glock_wait_internal+350} > > {:gfs:gfs_glock_nq+961} > > {:gfs:gfs_glock_nq_init+20} > > {:gfs:gfs_private_nopage+84} > > {do_no_page+1003} > > {do_wp_page+948} > > {handle_mm_fault+343} > > {get_signal_to_deliver+1118} > > {do_page_fault+518} > > {sys_accept+327} > > {pipe_read+26} > {error_exit+0} > > > > httpd D 0000000000000000 0 5897 > 5892 5911 5896 > > (NOTLB) > > 0000010110119bd8 0000000000000006 0000010117002030 0000000000000075 > > 0000010117002030 0000000000000075 000001000c00a940 > 000000001b16e030 > > 00000101114787f0 000000000000fbe0 > > Call Trace:{__generic_unplug_device+19} > > {wait_for_completion+167} > > {default_wake_function+0} > > {default_wake_function+0} > > {:gfs:glock_wait_internal+350} > > {:gfs:gfs_glock_nq+961} > > {:gfs:gfs_glock_nq_init+20} > > {:gfs:gfs_private_nopage+84} > > {do_no_page+1003} > > {do_wp_page+948} > > {handle_mm_fault+343} > > {get_signal_to_deliver+1118} > > {do_page_fault+518} > > {thread_return+0} > > {thread_return+88} > {error_exit+0} > > > > > > httpd D 00000101100c3d48 0 5911 > 5892 5915 5897 > > (NOTLB) > > 00000101100c3bd8 0000000000000002 000001011420b7f0 0000000000000075 > > 00000101170027f0 0000000000000075 000001000c002940 > 0000000000000000 > > 000001011b16e030 000000000000187e > > Call Trace:{try_to_wake_up+863} > > {wait_for_completion+167} > > {default_wake_function+0} > > {default_wake_function+0} > > {:gfs:glock_wait_internal+350} > > {:gfs:gfs_glock_nq+961} > > {:gfs:gfs_glock_nq_init+20} > > {:gfs:gfs_private_nopage+84} > > {do_no_page+1003} > > {do_wp_page+948} > > {handle_mm_fault+343} > > {get_signal_to_deliver+1118} > > {do_page_fault+518} > > {thread_return+0} > > {thread_return+88} > {error_exit+0} > > > > > > httpd D 0000000000006a36 0 5915 > 5892 5911 > > (NOTLB) > > 00000101180f7ad8 0000000000000006 0000000000002706 ffffffffa020c791 > > 0000000000000000 0000000000000000 0000030348ac8c1c > 0000000114a217f0 > > 0000010114c997f0 000000000000076a > > Call Trace:{:dlm:lkb_swqueue+43} > > {io_schedule+38} > > {__wait_on_buffer+125} > > {bh_wake_function+0} > > {bh_wake_function+0} > > {:gfs:gfs_dreread+154} > > {:gfs:gfs_dread+40} > > {:gfs:gfs_get_meta_buffer+201} > > {:gfs:gfs_copyin_dinode+23} > > {:gfs:inode_go_lock+38} > > {:gfs:glock_wait_internal+563} > > {:gfs:gfs_glock_nq+961} > > {:gfs:gfs_glock_nq_init+20} > > {:gfs:gfs_private_nopage+84} > > {do_no_page+1003} > > {do_wp_page+948} > > {handle_mm_fault+343} > > {get_signal_to_deliver+1118} > > {do_page_fault+518} > > {thread_return+0} > > {thread_return+88} > {error_exit+0} > > > > > > sh D 000000000000001a 0 5930 2547 > > (NOTLB) > > 000001011090f8e8 0000000000000002 0000010111293d88 0000010110973d00 > > 0000010111293d88 0000000000000000 00000100dfc02400 > 0000000000010000 > > 00000101148557f0 0000000000002010 > > Call Trace:{io_schedule+38} > > {__wait_on_buffer+125} > > {bh_wake_function+0} > > {bh_wake_function+0} > > {:gfs:gfs_dreread+154} > > {:gfs:gfs_dread+40} > > {:gfs:gfs_get_meta_buffer+201} > > {:gfs:gfs_copyin_dinode+23} > > {:gfs:inode_go_lock+38} > > {:gfs:glock_wait_internal+563} > > {:gfs:gfs_glock_nq+961} > > {:gfs:gfs_glock_nq_init+20} > > {dummy_inode_permission+0} > > {:gfs:gfs_permission+64} > > {dput+56} {permission+51} > > {__link_path_walk+372} > > {link_path_walk+82} > > {do_page_fault+575} > > {__link_path_walk+1658} > > {link_path_walk+82} > > {do_page_fault+575} > > {path_lookup+451} > > {__user_walk+47} > > {vfs_stat+24} > {do_page_fault+575} > > > > {sys_newstat+17} > {error_exit+0} > > {system_call+126} > > -- > Best Regards, > Anton Kornev. > > References > > Visible links > 1. mailto:teigland at redhat.com From rpeterso at redhat.com Thu Jun 29 15:34:44 2006 From: rpeterso at redhat.com (Robert Peterson) Date: Thu, 29 Jun 2006 10:34:44 -0500 Subject: [Linux-cluster] two node cluster not coming up In-Reply-To: <000901c69b8a$94f95c40$beec14c0$@com> References: <000001c69ad9$a462d8c0$ed288a40$@com> <44A2E6C5.5080802@redhat.com> <000f01c69b38$42d95110$c88bf330$@com> <44A3E56C.6010001@redhat.com> <000901c69b8a$94f95c40$beec14c0$@com> Message-ID: <44A3F314.7030203@redhat.com> RR wrote: > Jun 29 14:36:39 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign > requested address Hi RR, These messages means that svr01 tried to send a broadcast/multicast message to the socket, but the underlying communications layer returned an error. Perhaps you can try it without the line: In your cluster.conf. Regards, Bob Peterson Red Hat Cluster Suite From djkast at gmail.com Thu Jun 29 15:42:19 2006 From: djkast at gmail.com (DJ-Kast .) Date: Thu, 29 Jun 2006 11:42:19 -0400 Subject: [Linux-cluster] I am getting the following error Message-ID: Jun 29 11:36:06 jtest1 clurgmgrd[5217]: Resource Group Manager Starting Jun 29 11:36:06 jtest1 clurgmgrd[5217]: Loading Service Data Jun 29 11:36:06 jtest1 clurgmgrd[5217]: Initializing Services Jun 29 11:36:06 jtest1 clurgmgrd[5217]: Services Initialized Jun 29 11:37:17 jtest1 clurgmgrd[5217]: Logged in SG "usrm::manager" Jun 29 11:37:17 jtest1 clurgmgrd[5217]: Magma Event: Membership Change Jun 29 11:37:17 jtest1 clurgmgrd[5217]: State change: Local UP Jun 29 11:37:47 jtest1 clurgmgrd[5217]: Node ID:0000000000000001 stuck with lock usrm::rg="MountME" Jun 29 11:38:47 jtest1 last message repeated 2 times Jun 29 11:39:13 jtest1 clurgmgrd[5217]: State change: vps3 UP Jun 29 11:39:13 jtest1 clurgmgrd[5217]: State change: vps1 UP MountME is the name of my service The resources include IP and NFS mount I have 3 nodes.. If node 1 has ownership and is running the NFS Mount and i issue a umount... No other nodes recover -------------- next part -------------- An HTML attachment was scrubbed... URL: From DERRICK.BEERY at iowa.gov Thu Jun 29 16:28:13 2006 From: DERRICK.BEERY at iowa.gov (Beery, Derrick [DAS]) Date: Thu, 29 Jun 2006 11:28:13 -0500 Subject: [Linux-cluster] cluster-1.02.00 make errors Message-ID: <4D9680752635E9448FF261A1443DD293033C14CC@iowadsmex04.iowa.gov.state.ia.us> Any ideas on this one? gnbd_export.c:26:25: error: sysfs/dlist.h: No such file or directory gnbd_export.c:27:28: error: sysfs/libsysfs.h: No such file or directory gnbd_export.c: In function $(B!F(Bget_sysfs_name -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Frank Hellmann Sent: Thursday, June 29, 2006 2:18 AM To: linux clustering Subject: Re: [Linux-cluster] cluster-1.02.00 make errors Hi Derrick, Make sure you build it this way: $ ./configure --kernel_src=/path/to/linux-2.6.x $ make install the usual make; make install won't work for various reasons. Just do a make install. Cheers, Frank... Ryan O'Hara wrote: > > Derrick, > > I was able to build the cluster-1.02.00 code against the vanilla > 2.6.16 kernel. Did you run the configure script with the --kernel_src > option to point to the correct kernel tree? > > ./configure --kernel_src=/path/to/kernel > > Ryan > > > > Beery, Derrick [DAS] wrote: > >> Make against 2.6.16 vanilla kernel is failing with the following errors: >> >> >> >> CC [M] /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_get': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few >> arguments to function `permission' >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_set': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few >> arguments to function `permission' >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_remove': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too >> few arguments to function `permission' >> >> make[5]: *** >> [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1 >> >> make[4]: *** >> [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2 >> >> >> >> Can anyone shed some light on this? >> >> >> Thanks in advance! >> >> >> >> Derrick Beery >> >> DAS ITE State of Iowa >> >> >> ------------------------------------------------------------------------ >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- ------------------------------------------------------------------------ -- Frank Hellmann Optical Art GmbH Waterloohain 7a DI Supervisor http://www.opticalart.de 22769 Hamburg frank at opticalart.de Tel: ++49 40 5111051 Fax: ++49 40 43169199 -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From ranjtech at gmail.com Thu Jun 29 16:28:13 2006 From: ranjtech at gmail.com (RR) Date: Fri, 30 Jun 2006 02:28:13 +1000 Subject: [Linux-cluster] two node cluster not coming up In-Reply-To: <44A3F314.7030203@redhat.com> References: <000001c69ad9$a462d8c0$ed288a40$@com> <44A2E6C5.5080802@redhat.com> <000f01c69b38$42d95110$c88bf330$@com> <44A3E56C.6010001@redhat.com> <000901c69b8a$94f95c40$beec14c0$@com> <44A3F314.7030203@redhat.com> Message-ID: <001601c69b99$06c98710$145c9530$@com> Bob, mate, you've done it. Obv. Had to be a network related issue and I should've thought of it since I was getting nada on svr00 when I tried capturing packets from svr01. Didn't know what parameter to change. Needed to reboot the machine and manually start the services but it's now happy I think. I'm pretty sure the bonded Ethernet interfaces support multicast. Not sure why it's getting rejected. My iptables and selinux are both disabled by default during install. BTW, What's the consequence of my removing the multicast address? Will it have a consequence in using DLM? Does it do Broadcast by default? Ok, so guess I'll move down the list of steps to get this moving with resources and services. I'm assuming I can have an active-active two node cluster? Thanks again \R -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Robert Peterson Sent: Friday, June 30, 2006 1:35 AM To: linux clustering Subject: Re: [Linux-cluster] two node cluster not coming up RR wrote: > Jun 29 14:36:39 svr01 ccsd[3685]: Unable to perform sendto: Cannot assign > requested address Hi RR, These messages means that svr01 tried to send a broadcast/multicast message to the socket, but the underlying communications layer returned an error. Perhaps you can try it without the line: In your cluster.conf. Regards, Bob Peterson Red Hat Cluster Suite -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From DERRICK.BEERY at iowa.gov Thu Jun 29 18:26:53 2006 From: DERRICK.BEERY at iowa.gov (Beery, Derrick [DAS]) Date: Thu, 29 Jun 2006 13:26:53 -0500 Subject: [Linux-cluster] cluster-1.02.00 make errors Message-ID: <4D9680752635E9448FF261A1443DD293033C14CD@iowadsmex04.iowa.gov.state.ia.us> Looks like it just needed sysfsutils-devel. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Beery, Derrick [DAS] Sent: Thursday, June 29, 2006 11:28 AM To: linux clustering Subject: RE: [Linux-cluster] cluster-1.02.00 make errors Any ideas on this one? gnbd_export.c:26:25: error: sysfs/dlist.h: No such file or directory gnbd_export.c:27:28: error: sysfs/libsysfs.h: No such file or directory gnbd_export.c: In function $(B!F(Bget_sysfs_name -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Frank Hellmann Sent: Thursday, June 29, 2006 2:18 AM To: linux clustering Subject: Re: [Linux-cluster] cluster-1.02.00 make errors Hi Derrick, Make sure you build it this way: $ ./configure --kernel_src=/path/to/linux-2.6.x $ make install the usual make; make install won't work for various reasons. Just do a make install. Cheers, Frank... Ryan O'Hara wrote: > > Derrick, > > I was able to build the cluster-1.02.00 code against the vanilla > 2.6.16 kernel. Did you run the configure script with the --kernel_src > option to point to the correct kernel tree? > > ./configure --kernel_src=/path/to/kernel > > Ryan > > > > Beery, Derrick [DAS] wrote: > >> Make against 2.6.16 vanilla kernel is failing with the following errors: >> >> >> >> CC [M] /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_get': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few >> arguments to function `permission' >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_set': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few >> arguments to function `permission' >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_remove': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too >> few arguments to function `permission' >> >> make[5]: *** >> [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1 >> >> make[4]: *** >> [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2 >> >> >> >> Can anyone shed some light on this? >> >> >> Thanks in advance! >> >> >> >> Derrick Beery >> >> DAS ITE State of Iowa >> >> >> ------------------------------------------------------------------------ >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- ------------------------------------------------------------------------ -- Frank Hellmann Optical Art GmbH Waterloohain 7a DI Supervisor http://www.opticalart.de 22769 Hamburg frank at opticalart.de Tel: ++49 40 5111051 Fax: ++49 40 43169199 -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From DERRICK.BEERY at iowa.gov Thu Jun 29 18:36:37 2006 From: DERRICK.BEERY at iowa.gov (Beery, Derrick [DAS]) Date: Thu, 29 Jun 2006 13:36:37 -0500 Subject: [Linux-cluster] cluster-1.02.00 make errors Message-ID: <4D9680752635E9448FF261A1443DD293033C14CE@iowadsmex04.iowa.gov.state.ia.us> It also seems that cluster-1.02.00 cannot be built against a kernel including OpenVZ for some reason. Any ideas why this would be? Thanks, Derrick -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Beery, Derrick [DAS] Sent: Thursday, June 29, 2006 1:27 PM To: linux clustering Subject: RE: [Linux-cluster] cluster-1.02.00 make errors Looks like it just needed sysfsutils-devel. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Beery, Derrick [DAS] Sent: Thursday, June 29, 2006 11:28 AM To: linux clustering Subject: RE: [Linux-cluster] cluster-1.02.00 make errors Any ideas on this one? gnbd_export.c:26:25: error: sysfs/dlist.h: No such file or directory gnbd_export.c:27:28: error: sysfs/libsysfs.h: No such file or directory gnbd_export.c: In function $(B!F(Bget_sysfs_name -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Frank Hellmann Sent: Thursday, June 29, 2006 2:18 AM To: linux clustering Subject: Re: [Linux-cluster] cluster-1.02.00 make errors Hi Derrick, Make sure you build it this way: $ ./configure --kernel_src=/path/to/linux-2.6.x $ make install the usual make; make install won't work for various reasons. Just do a make install. Cheers, Frank... Ryan O'Hara wrote: > > Derrick, > > I was able to build the cluster-1.02.00 code against the vanilla > 2.6.16 kernel. Did you run the configure script with the --kernel_src > option to point to the correct kernel tree? > > ./configure --kernel_src=/path/to/kernel > > Ryan > > > > Beery, Derrick [DAS] wrote: > >> Make against 2.6.16 vanilla kernel is failing with the following errors: >> >> >> >> CC [M] /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_get': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:72: too few >> arguments to function `permission' >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_set': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:95: too few >> arguments to function `permission' >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c: In >> function `user_eo_remove': >> >> /usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.c:120: too >> few arguments to function `permission' >> >> make[5]: *** >> [/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs/eaops.o] Error 1 >> >> make[4]: *** >> [_module_/usr/local/src/cluster-1.02.00/gfs-kernel/src/gfs] Error 2 >> >> >> >> Can anyone shed some light on this? >> >> >> Thanks in advance! >> >> >> >> Derrick Beery >> >> DAS ITE State of Iowa >> >> >> ------------------------------------------------------------------------ >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- ------------------------------------------------------------------------ -- Frank Hellmann Optical Art GmbH Waterloohain 7a DI Supervisor http://www.opticalart.de 22769 Hamburg frank at opticalart.de Tel: ++49 40 5111051 Fax: ++49 40 43169199 -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From ugo.parsi at gmail.com Thu Jun 29 22:53:41 2006 From: ugo.parsi at gmail.com (Ugo PARSI) Date: Fri, 30 Jun 2006 00:53:41 +0200 Subject: [Linux-cluster] Using GULM with CLVM In-Reply-To: <44A3DD94.3070001@redhat.com> References: <44A3A790.8090206@redhat.com> <44A3C16E.6050603@redhat.com> <44A3DD94.3070001@redhat.com> Message-ID: > > clvmd should log errors to syslog. Check that the gulm cluster is quorate as > clvmd won't do anything without a quorate cluster. You might also like to run > it with -d and see if any errors appear on stderr. > Yes you're right, thanks, if clvmd & lvm are stuck this is because gulm is inquorate and simply doesn't work at all.... But I still can't figure out on how making it work, I've spent a lot of hours on it now, and all of my problems seems to be IPv4/IPv6/hostname related (I guess).... To ease the configuration and the trial process, I've reduced my cluster.conf to the simplest case (I guess) : just 2 client nodes and 1 master gulm one. All of them are working on the 10.x.x.x private IPv4 subnet. Again I don't know if I can trust the 'documentation' or not, since it is written that gulm is working on IPv6 sockets only and on the man pages (man lock_gulmd) it seems that both IPv4 and IPv6 sockets are handled by GULM. My first problem is this one : venus:/etc/init.d# lock_gulmd --use_ccs Warning! You didn't specify a cluster name before --use_ccs Letting ccsd choose which cluster we belong to. I cannot find the name for ip "::ffff:10.1.1.5". Stopping. Gulm requires 1,3,4, or 5 nodes to be specified in the servers list. You specified 0 I cannot find the name for ip "::ffff:10.1.1.5". Stopping. Gulm requires 1,3,4, or 5 nodes to be specified in the servers list. You specified 0 venus:/etc/init.d# I cannot find the name for ip "::ffff:10.1.1.5". Stopping. Gulm requires 1,3,4, or 5 nodes to be specified in the servers list. You specified 0 Apparently, GULM forces some kind of ~IPv6-translated-IPv4~ address that it can't find anywhere on the system. Here's my /etc/hosts : ---------------------------------------- venus:/etc/init.d# cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 10.1.1.5 venus # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts ------------------------------------------ And here's my cluster.conf ----------------------------------------------------------- venus:/etc/init.d# cat /etc/cluster/cluster.conf --------------------------------------------------------- So in order to force it's host-matching process, I've modified my /etc/hosts according to that : ------------------------------------------ venus:/etc/init.d# cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 10.1.1.5 venus ::ffff:10.1.1.5 venus # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts ------------------------------------------- With that one, it *SEEMED* to work (it's not printing messages at runtime anymore and silently forks the daemon), but still on the logs : It seems it cannot find his IP or host again : Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: I am (venus) with ip (::) :( Here's the whole part : Jun 30 00:19:01 venus lock_gulmd_main[26211]: Forked lock_gulmd_core. Jun 30 00:19:01 venus lock_gulmd_core[26215]: Starting lock_gulmd_core 1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc. All rights reserved. Jun 30 00:19:01 venus lock_gulmd_core[26215]: I am running in Standard mode. Jun 30 00:19:01 venus lock_gulmd_core[26215]: I am (venus) with ip (::) Jun 30 00:19:01 venus lock_gulmd_core[26215]: This is cluster iliona Jun 30 00:19:01 venus lock_gulmd_core[26215]: EOF on xdr (Magma::26198 ::1 idx:1 fd:6) Jun 30 00:19:02 venus lock_gulmd_main[26211]: Forked lock_gulmd_LT. Jun 30 00:19:02 venus lock_gulmd_LT[26219]: Starting lock_gulmd_LT 1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc. All rights reserved. Jun 30 00:19:02 venus lock_gulmd_LT[26219]: I am running in Standard mode. Jun 30 00:19:02 venus lock_gulmd_LT[26219]: I am (venus) with ip (::) Jun 30 00:19:02 venus lock_gulmd_LT[26219]: This is cluster iliona Jun 30 00:19:02 venus lock_gulmd_LT000[26219]: Not serving locks from this node. Jun 30 00:19:02 venus lock_gulmd_core[26215]: EOF on xdr (Magma::26198 ::1 idx:1 fd:6) Jun 30 00:19:03 venus lock_gulmd_main[26211]: Forked lock_gulmd_LTPX. Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: Starting lock_gulmd_LTPX 1.02.00. (built Jun 23 2006 18:56:19) Copyright (C) 2004 Red Hat, Inc. All rights reserved. Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: I am running in Standard mode. Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: I am (venus) with ip (::) Jun 30 00:19:03 venus lock_gulmd_LTPX[26223]: This is cluster iliona Jun 30 00:19:03 venus ccsd[26197]: Connected to cluster infrastruture via: GuLM Plugin v1.0.4 Jun 30 00:19:03 venus ccsd[26197]: Initial status:: Inquorate And indeed it's not acting as a 'Server/Master' but as a 'Client' too : venus:/etc/init.d# gulm_tool getstats venus I_am = Client quorum_has = 1 quorum_needs = 1 rank = -1 quorate = false GenerationID = 0 run time = 128 pid = 27456 verbosity = Default failover = disabled venus:/etc/init.d# Of course the other 2 nodes are acting the same way, and with no master, the cluster is always in inquorate/unusable state, hence my problems with clvmd/lvm. I've tried many other things, like putting the names inside cluster.conf (with the names inside /etc/hosts or DNS-based only, etc..) instead of IPs, etc.... but still the same error. I am getting really confused by the whole system and the lack of documentation is really painful for me to find my mistakes as a cluster-suite beginner :/ If you have any ideas :), Thanks a lot, Ugo PARSI -- An apple a day, keeps the doctor away From jason at monsterjam.org Fri Jun 30 22:50:10 2006 From: jason at monsterjam.org (Jason) Date: Fri, 30 Jun 2006 18:50:10 -0400 Subject: [Linux-cluster] newbie questions Message-ID: <20060630225010.GA3972@monsterjam.org> so I have a 2 node cluster im setting up and im trying to use the /usr/bin/system-config-cluster im setting up my nodes, and setting up fencing... and for my AP7900, ive got box1 plugged into ports 1,2 and box 2 plugged into ports 3,4.. first question, I dont see how to set up multiple fence ports for each box. second question, what the heck does it want on the "edit fence properties", it says port, and I understand that, but when it asks for switch, what does it want there? seems to want a number, but it takes an ip address as well. not sure. regards, Jason