From dma+linuxcluster at witbe.net Tue Apr 1 09:32:18 2008 From: dma+linuxcluster at witbe.net (Daniel Maher) Date: Tue, 1 Apr 2008 11:32:18 +0200 Subject: [Linux-cluster] (newbie) mirrored data / cluster ? In-Reply-To: <47F138E7.4000903@cmiware.com> References: <20080331194027.101fdf09@danstation> <9F633DE6C0E04F4691DCB713AC44C94B066E4C5B@EXCHANGE.SHSU.EDU> <47F138E7.4000903@cmiware.com> Message-ID: <20080401113218.45dd3296@danstation> On Mon, 31 Mar 2008 14:17:59 -0500 Chris Harms wrote: > The non-SAN option would be to use DRBD (http://www.drbd.org) and put > NFS, Samba, etc on top of the DRBD partition. Thank you for your reply. On this topic, consider this paper by Lars Ellenberg : http://www.drbd.org/fileadmin/drbd/publications/drbd8.linux-conf.eu.2007.pdf Where he notes the following : "The most inconvenient limitations is currently that DRBD supports only two nodes natively." While this is not a problem in my theoretical two-server setup, should we wish to add a third server in the future (which i find highly likely), then DRBD will no longer be an appropriate solution. Furthermore, that same paper seems to suggest that DRBD is best used in a primary / secondary relationship, whereas i'm suggesting an "all-primary" sort of setup. -- Daniel Maher -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From dma+linuxcluster at witbe.net Tue Apr 1 09:41:22 2008 From: dma+linuxcluster at witbe.net (Daniel Maher) Date: Tue, 1 Apr 2008 11:41:22 +0200 Subject: [Linux-cluster] (newbie) mirrored data / cluster ? In-Reply-To: <9F633DE6C0E04F4691DCB713AC44C94B066E4C5B@EXCHANGE.SHSU.EDU> References: <20080331194027.101fdf09@danstation> <9F633DE6C0E04F4691DCB713AC44C94B066E4C5B@EXCHANGE.SHSU.EDU> Message-ID: <20080401114122.24df5244@danstation> On Mon, 31 Mar 2008 13:57:46 -0500 "MARTI, ROBERT JESSE" wrote: > You don't have to have a mirrored LVM to do what youre trying to do. > You just need a common mountable share - typically a SAN or NAS. It > shouldn't be too hard to configure (and I've already done it). You > don't even *have* to have cluster suite - if you have a load balancer. > My brain isn't fast enough today to figure out how to share a load > without a load balanced VIP or a DNS round robin (which should be easy > to do as well). Thank you for your reply. As for your suggestion of having a common mountable share - well, yes, that's exactly what i'm trying to do. I want to take to servers, and create a NAS device from them. I don't already have a load balancer, but using RRDNS is straightforward enough. The other aspect of this initiative is to gain some useful applicative experience with cluster suite, as we'd like to clusterise our front-end web servers down the road as well. -- Daniel Maher -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From gordan at bobich.net Tue Apr 1 09:50:10 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Tue, 1 Apr 2008 10:50:10 +0100 (BST) Subject: [Linux-cluster] (newbie) mirrored data / cluster ? In-Reply-To: <20080401113218.45dd3296@danstation> References: <20080331194027.101fdf09@danstation> <9F633DE6C0E04F4691DCB713AC44C94B066E4C5B@EXCHANGE.SHSU.EDU> <47F138E7.4000903@cmiware.com> <20080401113218.45dd3296@danstation> Message-ID: On Tue, 1 Apr 2008, Daniel Maher wrote: >> The non-SAN option would be to use DRBD (http://www.drbd.org) and put >> NFS, Samba, etc on top of the DRBD partition. > > On this topic, consider this paper by Lars Ellenberg : > http://www.drbd.org/fileadmin/drbd/publications/drbd8.linux-conf.eu.2007.pdf > > Where he notes the following : > "The most inconvenient limitations is currently that DRBD supports only > two nodes natively." I'm not 100% sure, but I think this limit is increased in latest 8.0 and 8.2 releases. > While this is not a problem in my theoretical two-server setup, should > we wish to add a third server in the future (which i find highly > likely), then DRBD will no longer be an appropriate solution. I'd double check that this is still a limitation. Ask on the DRBD list. > Furthermore, that same paper seems to suggest that DRBD is best used in > a primary / secondary relationship, whereas i'm suggesting an > "all-primary" sort of setup. That is the way it has been used traditionally with DRBD <= 7.x, but for a while now primary/primary operation has been fully supported (obviously, you need to use a FS that is aware of such things, such as GFS(2) or OCFS(2)). Gordan From Danny.Wall at health-first.org Tue Apr 1 16:45:28 2008 From: Danny.Wall at health-first.org (Danny Wall) Date: Tue, 01 Apr 2008 12:45:28 -0400 Subject: [Linux-cluster] Using GFS and DLM without RHCS Message-ID: <47F22E68020000C800008B81@mail-int.health-first.org> Danny Wall wrote: > I was wondering if it is possible to run GFS on several machines with a > shared GFS LUN, but not use full clustering like RHCS. From the FAQs: > First of all, what's the problem with having RHCS running? It doesn't > mean you have to use it to handle resources failing over. You can run > it all in active/active setup with load balancing in front. I was looking to minimize everything as much as possible, so if it is not needed, do not install it. This would reduce problems with updates and overall management. Having said that, your solution is still a better alternative for my needs, and options like this are what I am looking for. Thanks > If this is not an acceptable solution for you and you still cannot be > bothered to create cluster.conf (and that is all that is required), > you can always use OCFS2. This doesn't have a cluster component (it's > totally unrelated to RHCS), but you still have to create the > equivalent config, so you won't be saving yourself any effort. > Gordan OCFS is out of he question. OCFS can not handle the number of files and directories on these servers. I don't technically need a cluster, but the cluster filesystem allows me to have multiple servers with access to the storage at the same time, reducing downtime, and allowing for processes like backups to run on a different server and not overload the server used by the end users. If I can implement this without the cluster, it will reduce complexity. Some of the problems I have seen recently include the cluster failing to relocate resources, and not properly fencing a node. Thanks Danny ##################################### This message is for the named person's use only. It may contain private, proprietary, or legally privileged information. No privilege is waived or lost by any mistransmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it, and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Health First reserves the right to monitor all e-mail communications through its networks. Any views or opinions expressed in this message are solely those of the individual sender, except (1) where the message states such views or opinions are on behalf of a particular entity; and (2) the sender is authorized by the entity to give such views or opinions. ##################################### From garromo at us.ibm.com Tue Apr 1 17:38:10 2008 From: garromo at us.ibm.com (Gary Romo) Date: Tue, 1 Apr 2008 11:38:10 -0600 Subject: [Linux-cluster] VIP's on mixed subnets Message-ID: In my cluster all of my servers NICs are bonded. Up until recently all of my VIPs (for resources/services) were in the same subnet. Is it ok that VIPs be in mixed subnets? Thanks. Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsucharz at poczta.onet.pl Tue Apr 1 19:29:35 2008 From: tsucharz at poczta.onet.pl (Tomasz Sucharzewski) Date: Tue, 1 Apr 2008 21:29:35 +0200 Subject: [Linux-cluster] (newbie) mirrored data / cluster ? In-Reply-To: <47F138E7.4000903@cmiware.com> References: <20080331194027.101fdf09@danstation> <9F633DE6C0E04F4691DCB713AC44C94B066E4C5B@EXCHANGE.SHSU.EDU> <47F138E7.4000903@cmiware.com> Message-ID: <20080401212935.726ee726.tsucharz@poczta.onet.pl> Hello, BTW do you know any software solution that supports asynchronous replication on Linux like AVS on Solaris ? Best regards, Tomek On Mon, 31 Mar 2008 14:17:59 -0500 Chris Harms wrote: > The non-SAN option would be to use DRBD (http://www.drbd.org) and put > NFS, Samba, etc on top of the DRBD partition. > > Chris > > MARTI, ROBERT JESSE wrote: > > You don't have to have a mirrored LVM to do what youre trying to do. > > You just need a common mountable share - typically a SAN or NAS. It > > shouldn't be too hard to configure (and I've already done it). You > > don't even *have* to have cluster suite - if you have a load balancer. > > My brain isn't fast enough today to figure out how to share a load > > without a load balanced VIP or a DNS round robin (which should be easy > > to do as well). > > > > Rob Marti > > Systems Analyst II > > Sam Houston State University > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Daniel Maher > > Sent: Monday, March 31, 2008 12:40 PM > > To: linux-cluster at redhat.com > > Subject: [Linux-cluster] (newbie) mirrored data / cluster ? > > > > Hello all, > > > > I have spent the day reading through the mailing list archives, Redhat > > documentation, and CentOS forums, and - to be frank - my head is now > > swimming with information. > > > > My scenario seems reasonably straightforward : I would like to have two > > file servers which mirror each others' data, then i'd like those two > > servers to act as a cluster, whereby they serve said data as if they > > were one machine. If one of the servers suffers a critical failure, the > > other will stay up, and the data will continue to be accessible to the > > rest of the network. > > > > I note with some trepidation that this might not be possible, as per > > this document : > > http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/en-US/RHEL51 > > 0/Cluster_Logical_Volume_Manager/mirrored_volumes.html > > > > However, i don't know if that document relates to the same scenario i've > > described above. I would very much appreciate any and all feedback, > > links to further documentation, and any other information that you might > > like to share. > > > > Thank you ! > > > > > > -- > > Daniel Maher > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Tomasz Sucharzewski From anujhere at gmail.com Tue Apr 1 21:23:37 2008 From: anujhere at gmail.com (=?UTF-8?Q?=E0=A4=85=E0=A4=A8=E0=A5=81=E0=A4=9C_Anuj_Singh?=) Date: Wed, 2 Apr 2008 02:53:37 +0530 Subject: [Linux-cluster] distributed file system... can we achieve effectively using linux Message-ID: <3120c9e30804011423u19670332lba8bfe82b4543066@mail.gmail.com> Hi, How can we create a common Q drive using linux that meets the following needs ? It should be possible to logically the common Q drive into smaller partitions, each managed by a custodian The custodian of a partition, should be able to monitor and control the usage of a partition. Presently, Q drives are used as shared folders at different locations over WAN. (so the network traffic and server load will be a factor, not all the files of Q - drive is required among locations.) Present Q drives are on windows platform. Do we have a better option over microsoft's DFS? Thanks and Regards Anuj -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at ntsg.umt.edu Wed Apr 2 00:19:15 2008 From: andrew at ntsg.umt.edu (Andrew A. Neuschwander) Date: Tue, 1 Apr 2008 18:19:15 -0600 (MDT) Subject: [Linux-cluster] dlm high cpu on latest stock centos 5.1 kernel Message-ID: <33710.10.8.105.69.1207095555.squirrel@secure.ntsg.umt.edu> I have a GFS cluster with one node serving files via smb and nfs. Under fairly light usage (5-10 users) the cpu is getting pounded by dlm. I am using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This sounds like the dlm issue mentioned back in March of last year (https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html) that was resolved in 2.6.21. Has (or will) this fix be back ported to the current el5 kernel? Will it be in RHEL5.2? What is the easiest way for me to get this fix? Also, if I try a newer kernel on this node, will there be any harm in the other nodes using their current kernel? Thanks, -Andrew -- Andrew A. Neuschwander, RHCE Linux Systems Administrator Numerical Terradynamic Simulation Group College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew at ntsg.umt.edu - 406.243.6310 From david at eciad.ca Wed Apr 2 00:30:56 2008 From: david at eciad.ca (David Ayre) Date: Tue, 1 Apr 2008 17:30:56 -0700 Subject: [Linux-cluster] dlm high cpu on latest stock centos 5.1 kernel In-Reply-To: <33710.10.8.105.69.1207095555.squirrel@secure.ntsg.umt.edu> References: <33710.10.8.105.69.1207095555.squirrel@secure.ntsg.umt.edu> Message-ID: What do you mean by pounded exactly ? We have an ongoing issue, similar... when we have about a dozen users using both smb/nfs, and at some seemingly random point in time our dlm_senddd chews up 100% of the CPU... then dies down at on its own after quite a while. Killing SMB processes, shutting down SMB didn't seem to have any affect... only a reboot cures it. I've seen this described (if this is the same issue) as a "soft lockup" as it does seem to come back to life: http://lkml.org/lkml/2007/10/4/137 We've been assuming its a kernel/dlm version as we are running 2.6.9-55.0.6.ELsmp with dlm-kernel 2.6.9-46.16.0.8 we were going to try a kernel update this week... but you seem to be using a later version and still have this problem ? Could you elaborate on "getting pounded by dlm" ? I've posted about this on this list in the past but received no assistance. On 1-Apr-08, at 5:19 PM, Andrew A. Neuschwander wrote: > I have a GFS cluster with one node serving files via smb and nfs. > Under > fairly light usage (5-10 users) the cpu is getting pounded by dlm. I > am > using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This > sounds > like the dlm issue mentioned back in March of last year > (https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html > ) > that was resolved in 2.6.21. > > Has (or will) this fix be back ported to the current el5 kernel? > Will it > be in RHEL5.2? What is the easiest way for me to get this fix? > > Also, if I try a newer kernel on this node, will there be any harm > in the > other nodes using their current kernel? > > Thanks, > -Andrew > -- > Andrew A. Neuschwander, RHCE > Linux Systems Administrator > Numerical Terradynamic Simulation Group > College of Forestry and Conservation > The University of Montana > http://www.ntsg.umt.edu > andrew at ntsg.umt.edu - 406.243.6310 > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~ David Ayre Programmer/Analyst - Information Technlogy Services Emily Carr Institute of Art and Design Vancouver, B.C. Canada 604-844-3875 / david at eciad.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at ntsg.umt.edu Wed Apr 2 00:51:01 2008 From: andrew at ntsg.umt.edu (Andrew A. Neuschwander) Date: Tue, 1 Apr 2008 18:51:01 -0600 (MDT) Subject: [Linux-cluster] dlm high cpu on latest stock centos 5.1 kernel In-Reply-To: References: <33710.10.8.105.69.1207095555.squirrel@secure.ntsg.umt.edu> Message-ID: <47567.10.8.105.69.1207097461.squirrel@secure.ntsg.umt.edu> My symptoms are similar. dlm_send sits on all of the cpu. Top shows the cpu spending nearly all of it's time in sys or interrupt handling. Disk and network I/O isn't very high (as seen via iostat and iptraf). But SMB/NFS throughput and latency are horrible. Context switches per second as seen by vmstat are in the 20,000+ range (I don't now if this is high though, I haven't really paid attention to this in the past). Nothing crashes, and it is still able to serve data (very slowly), and eventually the load and latency recovers. As an aside, does anyone know how to _view_ the resource group size after file system creation on GFS? Thanks, -Andrew On Tue, April 1, 2008 6:30 pm, David Ayre wrote: > What do you mean by pounded exactly ? > > We have an ongoing issue, similar... when we have about a dozen users > using both smb/nfs, and at some seemingly random point in time our > dlm_senddd chews up 100% of the CPU... then dies down at on its own > after quite a while. Killing SMB processes, shutting down SMB didn't > seem to have any affect... only a reboot cures it. I've seen this > described (if this is the same issue) as a "soft lockup" as it does > seem to come back to life: > > http://lkml.org/lkml/2007/10/4/137 > > We've been assuming its a kernel/dlm version as we are running > 2.6.9-55.0.6.ELsmp with dlm-kernel 2.6.9-46.16.0.8 > > we were going to try a kernel update this week... but you seem to be > using a later version and still have this problem ? > > Could you elaborate on "getting pounded by dlm" ? I've posted about > this on this list in the past but received no assistance. > > > > > On 1-Apr-08, at 5:19 PM, Andrew A. Neuschwander wrote: > >> I have a GFS cluster with one node serving files via smb and nfs. >> Under >> fairly light usage (5-10 users) the cpu is getting pounded by dlm. I >> am >> using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This >> sounds >> like the dlm issue mentioned back in March of last year >> (https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html >> ) >> that was resolved in 2.6.21. >> >> Has (or will) this fix be back ported to the current el5 kernel? >> Will it >> be in RHEL5.2? What is the easiest way for me to get this fix? >> >> Also, if I try a newer kernel on this node, will there be any harm >> in the >> other nodes using their current kernel? >> >> Thanks, >> -Andrew >> -- >> Andrew A. Neuschwander, RHCE >> Linux Systems Administrator >> Numerical Terradynamic Simulation Group >> College of Forestry and Conservation >> The University of Montana >> http://www.ntsg.umt.edu >> andrew at ntsg.umt.edu - 406.243.6310 >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~ > David Ayre > Programmer/Analyst - Information Technlogy Services > Emily Carr Institute of Art and Design > Vancouver, B.C. Canada > 604-844-3875 / david at eciad.ca > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From sajithks at cdactvm.in Wed Apr 2 04:57:59 2008 From: sajithks at cdactvm.in (sajith) Date: Wed, 2 Apr 2008 10:27:59 +0530 Subject: [Linux-cluster] linux cluster on rhel5 without using gfs and shared storage Message-ID: <200804020441.m324fXZ6024516@cdactvm.in> Hai all, I am new to linux cluster. I want to set up a two node cluster using rhcs. In my application I am using tomcat and mysql as database. My aim is to configure both servers in active-passive configuration. I have tested the failover of ip and process using conga. But I am stuck with the configuration of mysql failover. I am confused with how to make the data files redundant for mysql. If I am using nfs for data files the files are not accessible if nfs server is down. How can I create an online back up of my data files so that if my main server is down then also I can access my data from the online backup? I don't have gfs and SAN storage. My data files and application reside in same machine. Please help Regards, Sajith K.S ______________________________________ Scanned and protected by Email scanner From ebaydaan at gmail.com Tue Apr 1 19:42:37 2008 From: ebaydaan at gmail.com (Daan Biere) Date: Tue, 1 Apr 2008 21:42:37 +0200 Subject: [Linux-cluster] (newbie) mirrored data / cluster ? References: <20080331194027.101fdf09@danstation><9F633DE6C0E04F4691DCB713AC44C94B066E4C5B@EXCHANGE.SHSU.EDU><47F138E7.4000903@cmiware.com> <20080401212935.726ee726.tsucharz@poczta.onet.pl> Message-ID: Hi, i think the closest solution to AVS will be drbd: http://www.drbd.org/ DRBD takes over the data, writes it to the local disk and sends it to the other host. On the other host, it takes it to the disk there. The other components needed are a cluster membership service, which is supposed to be heartbeat, and some kind of application that works on top of a block device. Examples: A filesystem & fsck. A journaling FS. A database with recovery capabilities. ----- Original Message ----- From: "Tomasz Sucharzewski" To: Sent: Tuesday, April 01, 2008 9:29 PM Subject: Re: [Linux-cluster] (newbie) mirrored data / cluster ? > Hello, > > BTW do you know any software solution that supports asynchronous > replication on Linux like AVS on Solaris ? > > Best regards, > Tomek > > On Mon, 31 Mar 2008 14:17:59 -0500 > Chris Harms wrote: > >> The non-SAN option would be to use DRBD (http://www.drbd.org) and put >> NFS, Samba, etc on top of the DRBD partition. >> >> Chris >> >> MARTI, ROBERT JESSE wrote: >> > You don't have to have a mirrored LVM to do what youre trying to do. >> > You just need a common mountable share - typically a SAN or NAS. It >> > shouldn't be too hard to configure (and I've already done it). You >> > don't even *have* to have cluster suite - if you have a load balancer. >> > My brain isn't fast enough today to figure out how to share a load >> > without a load balanced VIP or a DNS round robin (which should be easy >> > to do as well). >> > >> > Rob Marti >> > Systems Analyst II >> > Sam Houston State University >> > >> > -----Original Message----- >> > From: linux-cluster-bounces at redhat.com >> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Daniel Maher >> > Sent: Monday, March 31, 2008 12:40 PM >> > To: linux-cluster at redhat.com >> > Subject: [Linux-cluster] (newbie) mirrored data / cluster ? >> > >> > Hello all, >> > >> > I have spent the day reading through the mailing list archives, Redhat >> > documentation, and CentOS forums, and - to be frank - my head is now >> > swimming with information. >> > >> > My scenario seems reasonably straightforward : I would like to have two >> > file servers which mirror each others' data, then i'd like those two >> > servers to act as a cluster, whereby they serve said data as if they >> > were one machine. If one of the servers suffers a critical failure, >> > the >> > other will stay up, and the data will continue to be accessible to the >> > rest of the network. >> > >> > I note with some trepidation that this might not be possible, as per >> > this document : >> > http://www.redhat.com/docs/manuals/enterprise/RHEL-5-manual/en-US/RHEL51 >> > 0/Cluster_Logical_Volume_Manager/mirrored_volumes.html >> > >> > However, i don't know if that document relates to the same scenario >> > i've >> > described above. I would very much appreciate any and all feedback, >> > links to further documentation, and any other information that you >> > might >> > like to share. >> > >> > Thank you ! >> > >> > >> > -- >> > Daniel Maher >> > >> > -- >> > Linux-cluster mailing list >> > Linux-cluster at redhat.com >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> > >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Tomasz Sucharzewski > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From maciej.bogucki at artegence.com Wed Apr 2 08:54:42 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Wed, 02 Apr 2008 10:54:42 +0200 Subject: [Linux-cluster] linux cluster on rhel5 without using gfs and shared storage In-Reply-To: <200804020441.m324fXZ6024516@cdactvm.in> References: <200804020441.m324fXZ6024516@cdactvm.in> Message-ID: <47F349D2.9030203@artegence.com> sajith napisa?(a): > Hai all, > > I am new to linux cluster. I want to set up a two node cluster using > rhcs. In my application I am using tomcat and mysql as database. My aim is > to configure both servers in active-passive configuration. I have tested the > failover of ip and process using conga. But I am stuck with the > configuration of mysql failover. I am confused with how to make the data > files redundant for mysql. If I am using nfs for data files the files are > not accessible if nfs server is down. How can I create an online back up of > my data files so that if my main server is down then also I can access my > data from the online backup? I don't have gfs and SAN storage. My data files > and application reside in same machine. Please help Hello, You can use drbd[1] to mirror block device via network. You also need some automatic failover mechanism fe. RHCS or heratbeat[2]. [1] - http://www.drbd.org/ [2] - http://www.linux-ha.org/ Best Regards Maciej Bogucki From swhiteho at redhat.com Wed Apr 2 09:53:34 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 02 Apr 2008 10:53:34 +0100 Subject: [Linux-cluster] About GFS1 and I/O barriers. In-Reply-To: <20080331151622.1360a2cb@mathieu.toulouse> References: <20080328153458.45fc6e13@mathieu.toulouse> <20080331124651.3f0d2428@mathieu.toulouse> <1206960860.3635.126.camel@quoit> <20080331151622.1360a2cb@mathieu.toulouse> Message-ID: <1207130014.3310.24.camel@localhost.localdomain> Hi, On Mon, 2008-03-31 at 15:16 +0200, Mathieu Avila wrote: > Le Mon, 31 Mar 2008 11:54:20 +0100, > Steven Whitehouse a ?crit : > > > Hi, > > > > Hi, > > > Both GFS1 and GFS2 are safe from this problem since neither of them > > use barriers. Instead we do a flush at the critical points to ensure > > that all data is on disk before proceeding with the next stage. > > > > I don't think this solves the problem. > > Consider a cheap iSCSI disk (no NVRAM, no UPS) accessed by all my GFS > nodes; this disk has a write cache enabled, which means it will reply > that write requests are performed even if they are not really written > on the platters. The disk (like most disks nowadays) has some logic > that allows it to optimize writes by re-scheduling them. It is possible > that all writes are ACK'd before the power failure, but only a fraction > of them were really performed : some are before the flush, some are > after the flush. > --Not all blocks writes before the flush were performed but other > blocks after the flush are written -> the FS is corrupted.-- > So, after the power failure all data in the disk's write cache are > forgotten. If the journal data was in the disk cache, the journal was > not written to disk, but other metadata have been written, so there are > metadata inconsistencies. > I don't agree that write caching implies that I/O must be acked before it has hit disk. It might well be reordered (which is ok), but if we wait for all outstanding I/O completions, then we ought to be able to be sure that all I/O is actually on disk, or at the very least that further I/O will not be reordered with already ACKed data. If devices are sending ACKs in advance of the I/O hitting disk then I think thats broken behaviour. Consider what happens if a device was to send an ACK for a write and then it discovers an uncorrectable error during the write - how would it then be able to report it since it had already sent an "ok"? So far as I can see the only reason for having the drive send an I/O completion back is to report the success or otherwise of the operation, and if that operation hasn't been completed, then we might just as well not wait for ACKs. > This is the problem that I/O barriers try to solve, by really forcing > the block device (and the block layer) to have all blocks issued before > the barrier to be written before any other after the barrier starts > begin written. > > The other solution is to completely disable the write cache of the > disks, but this leads to dramatically bad performances. > If its a choice between poor performance thats correct and good performance which might lose data, then I know which I would choose :-) Not all devices support barriers, so it always has to be an option; ext3 uses the barrier=1 mount option for this reason, and if it fails (e.g. if the underlying device doesn't support barriers) it falls back to the same technique which we are using in gfs1/2. The other thing to bear in mind is that barriers, as currently implemented are not really that great either. It would be nice to replace them with something that allows better performance with (for example) mirrors where the only current method of implementing the barrier is to wait for all the I/O completions from all the disks in the mirror set (and thus we are back to waiting for outstanding I/O again). Steve. From Bennie_R_Thomas at raytheon.com Wed Apr 2 14:05:27 2008 From: Bennie_R_Thomas at raytheon.com (Bennie Thomas) Date: Wed, 02 Apr 2008 09:05:27 -0500 Subject: [Linux-cluster] linux cluster on rhel5 without using gfs and shared storage In-Reply-To: <47F349D2.9030203@artegence.com> References: <200804020441.m324fXZ6024516@cdactvm.in> <47F349D2.9030203@artegence.com> Message-ID: <47F392A7.7060704@raytheon.com> You can attach a network disk device and have it fail over with the active system and make tomcat and mysql dependant of the disk resource. This is the simple route. when dealing with clusters you should keep the "KISS" approach in-mind. Regards, Bennie Any views or opinions presented are solely those of the author and do not necessarily represent those of Raytheon unless specifically stated. Electronic communications including email might be monitored by Raytheon. for operational or business reasons. Maciej Bogucki wrote: > sajith napisa?(a): > >> Hai all, >> >> I am new to linux cluster. I want to set up a two node cluster using >> rhcs. In my application I am using tomcat and mysql as database. My aim is >> to configure both servers in active-passive configuration. I have tested the >> failover of ip and process using conga. But I am stuck with the >> configuration of mysql failover. I am confused with how to make the data >> files redundant for mysql. If I am using nfs for data files the files are >> not accessible if nfs server is down. How can I create an online back up of >> my data files so that if my main server is down then also I can access my >> data from the online backup? I don't have gfs and SAN storage. My data files >> and application reside in same machine. Please help >> > > Hello, > > You can use drbd[1] to mirror block device via network. You also need > some automatic failover mechanism fe. RHCS or heratbeat[2]. > > [1] - http://www.drbd.org/ > [2] - http://www.linux-ha.org/ > > Best Regards > Maciej Bogucki > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bennie_R_Thomas at raytheon.com Wed Apr 2 14:13:13 2008 From: Bennie_R_Thomas at raytheon.com (Bennie Thomas) Date: Wed, 02 Apr 2008 09:13:13 -0500 Subject: [Linux-cluster] linux cluster on rhel5 without using gfs and shared storage In-Reply-To: <47F392A7.7060704@raytheon.com> References: <200804020441.m324fXZ6024516@cdactvm.in> <47F349D2.9030203@artegence.com> <47F392A7.7060704@raytheon.com> Message-ID: <47F39479.9050203@raytheon.com> I guess ignore my last reply; I just read the title in it's entirety. He does not want to use shared storage. Sorry !!! Bennie Thomas wrote: > You can attach a network disk device and have it fail over with the > active system and make > tomcat and mysql dependant of the disk resource. This is the simple > route. when dealing with > clusters you should keep the "KISS" approach in-mind. > > Regards, > > Bennie > > Any views or opinions presented are solely those of the author and do > not necessarily represent those > of Raytheon unless specifically stated. Electronic communications > including email might be monitored > by Raytheon. for operational or business reasons. > > > Maciej Bogucki wrote: >> sajith napisa?(a): >> >>> Hai all, >>> >>> I am new to linux cluster. I want to set up a two node cluster using >>> rhcs. In my application I am using tomcat and mysql as database. My aim is >>> to configure both servers in active-passive configuration. I have tested the >>> failover of ip and process using conga. But I am stuck with the >>> configuration of mysql failover. I am confused with how to make the data >>> files redundant for mysql. If I am using nfs for data files the files are >>> not accessible if nfs server is down. How can I create an online back up of >>> my data files so that if my main server is down then also I can access my >>> data from the online backup? I don't have gfs and SAN storage. My data files >>> and application reside in same machine. Please help >>> >> >> Hello, >> >> You can use drbd[1] to mirror block device via network. You also need >> some automatic failover mechanism fe. RHCS or heratbeat[2]. >> >> [1] - http://www.drbd.org/ >> [2] - http://www.linux-ha.org/ >> >> Best Regards >> Maciej Bogucki >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.wendy.cheng at gmail.com Wed Apr 2 14:26:58 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 2 Apr 2008 10:26:58 -0400 Subject: [Linux-cluster] About GFS1 and I/O barriers. In-Reply-To: <1207130014.3310.24.camel@localhost.localdomain> References: <20080328153458.45fc6e13@mathieu.toulouse> <20080331124651.3f0d2428@mathieu.toulouse> <1206960860.3635.126.camel@quoit> <20080331151622.1360a2cb@mathieu.toulouse> <1207130014.3310.24.camel@localhost.localdomain> Message-ID: <1a2a6dd60804020726g20d77419k47298eb000c431ec@mail.gmail.com> On Wed, Apr 2, 2008 at 5:53 AM, Steven Whitehouse wrote: > Hi, > > On Mon, 2008-03-31 at 15:16 +0200, Mathieu Avila wrote: > > Le Mon, 31 Mar 2008 11:54:20 +0100, > > Steven Whitehouse a ?crit : > > > > > Hi, > > > > > > > Hi, > > > > > Both GFS1 and GFS2 are safe from this problem since neither of them > > > use barriers. Instead we do a flush at the critical points to ensure > > > that all data is on disk before proceeding with the next stage. > > > > > > > I don't think this solves the problem. > > > > Consider a cheap iSCSI disk (no NVRAM, no UPS) accessed by all my GFS > > nodes; this disk has a write cache enabled, which means it will reply > > that write requests are performed even if they are not really written > > on the platters. The disk (like most disks nowadays) has some logic > > that allows it to optimize writes by re-scheduling them. It is possible > > that all writes are ACK'd before the power failure, but only a fraction > > of them were really performed : some are before the flush, some are > > after the flush. > > --Not all blocks writes before the flush were performed but other > > blocks after the flush are written -> the FS is corrupted.-- > > So, after the power failure all data in the disk's write cache are > > forgotten. If the journal data was in the disk cache, the journal was > > not written to disk, but other metadata have been written, so there are > > metadata inconsistencies. > > > I don't agree that write caching implies that I/O must be acked before > it has hit disk. It might well be reordered (which is ok), but if we > wait for all outstanding I/O completions, then we ought to be able to be > sure that all I/O is actually on disk, or at the very least that further > I/O will not be reordered with already ACKed data. If devices are > sending ACKs in advance of the I/O hitting disk then I think thats > broken behaviour. You seem to assume when disk subsystem acks back, the data is surely on disk. That is not correct . You may consider it a brokoen behavior, mostly from firmware bugs, but it occurs more often than you would expect. The problem is extremely difficult to debug from host side. So I think the proposal here is how the filesystem should protect itself from this situation (though I'm fuzzy about what the actual proposal is without looking into other subsystems, particularly volume manager, that are involved) You can not say "oh, then I don't have the responsibility. Please go to talk to disk vendors". Serious implementations have been trying to find good ways to solve this issue. -- Wendy Consider what happens if a device was to send an ACK for a write and > then it discovers an uncorrectable error during the write - how would it > then be able to report it since it had already sent an "ok"? So far as I > can see the only reason for having the drive send an I/O completion back > is to report the success or otherwise of the operation, and if that > operation hasn't been completed, then we might just as well not wait for > ACKs. > > > This is the problem that I/O barriers try to solve, by really forcing > > the block device (and the block layer) to have all blocks issued before > > the barrier to be written before any other after the barrier starts > > begin written. > > > > The other solution is to completely disable the write cache of the > > disks, but this leads to dramatically bad performances. > > > If its a choice between poor performance thats correct and good > performance which might lose data, then I know which I would choose :-) > Not all devices support barriers, so it always has to be an option; ext3 > uses the barrier=1 mount option for this reason, and if it fails (e.g. > if the underlying device doesn't support barriers) it falls back to the > same technique which we are using in gfs1/2. > > The other thing to bear in mind is that barriers, as currently > implemented are not really that great either. It would be nice to > replace them with something that allows better performance with (for > example) mirrors where the only current method of implementing the > barrier is to wait for all the I/O completions from all the disks in the > mirror set (and thus we are back to waiting for outstanding I/O again). > > Steve. > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cluster at defuturo.co.uk Wed Apr 2 14:02:58 2008 From: cluster at defuturo.co.uk (Robert Clark) Date: Wed, 02 Apr 2008 15:02:58 +0100 Subject: [Linux-cluster] clvmd hang Message-ID: <1207144978.2373.34.camel@rutabaga.defuturo.co.uk> I'm having some problems with clvmd hanging on our 8-node cluster. Once hung, any lvm commands wait indefinitely. This normally happens when starting up the cluster or if multiple nodes reboot. After some experimentation I've managed to reproduce it consistently on a smaller 3-node test cluster by stopping clvmd on one node and then running vgscan on another. The vgscan will hang together with clvmd. Restarting clvmd on the stopped node doesn't wake it up. Once hung, an strace shows 3 clvmd threads, 2 waiting on futexes and one trying to read from /dev/misc/dlm_clvmd. All 3 threads wait indefinitely on these system calls. Here's the last part of the strace: [pid 2951] select(1024, [4 6], NULL, NULL, {90, 0}) = 1 (in [4], left {56, 190000}) [pid 2951] accept(4, {sa_family=AF_FILE, path=@}, [2]) = 5 [pid 2951] ioctl(6, 0x7805, 0) = 1 [pid 2951] select(1024, [4 5 6], NULL, NULL, {90, 0}) = 1 (in [5], left {90, 0}) [pid 2951] read(5, "3\0\0\0\0\0\0\0\0\0\0\0\v\0\0\0\0\4\4P_global\0\0", 4096) = 29 [pid 2951] futex(0x84d64f4, FUTEX_WAIT, 2, NULL P_global doesn't show up in /proc/cluster/dlm_locks at this point. Here's what I can get from dlm_debug: clvmd rebuilt 5 resources clvmd purge requests clvmd purged 0 requests clvmd mark waiting requests clvmd marked 0 requests clvmd purge locks of departed nodes clvmd purged 0 locks clvmd update remastered resources clvmd updated 0 resources clvmd rebuild locks clvmd rebuilt 0 locks clvmd recover event 22 done clvmd move flags 0,0,1 ids 11,22,22 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 22 finished clvmd move flags 1,0,0 ids 22,22,22 clvmd move flags 0,1,0 ids 22,23,22 clvmd move use event 23 clvmd recover event 23 clvmd add node 1 clvmd total nodes 3 clvmd rebuild resource directory clvmd rebuilt 5 resources clvmd purge requests clvmd purged 0 requests clvmd mark waiting requests clvmd marked 0 requests clvmd recover event 23 done clvmd move flags 0,0,1 ids 22,23,23 clvmd process held requests clvmd processed 0 requests clvmd resend marked requests clvmd resent 0 requests clvmd recover event 23 finished I'm running 4.6 with kernel-hugemem-2.6.9-67.0.7.EL, lvm2-cluster-2.02.27-2.el4_6.2 & dlm-kernel-hugemem-2.6.9-52.5. Has anyone else seen anything like this? Thanks, Robert From ccaulfie at redhat.com Wed Apr 2 14:43:22 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Wed, 02 Apr 2008 15:43:22 +0100 Subject: [Linux-cluster] clvmd hang In-Reply-To: <1207144978.2373.34.camel@rutabaga.defuturo.co.uk> References: <1207144978.2373.34.camel@rutabaga.defuturo.co.uk> Message-ID: <47F39B8A.3040107@redhat.com> Robert Clark wrote: > I'm having some problems with clvmd hanging on our 8-node cluster. > Once hung, any lvm commands wait indefinitely. This normally happens > when starting up the cluster or if multiple nodes reboot. After some > experimentation I've managed to reproduce it consistently on a smaller > 3-node test cluster by stopping clvmd on one node and then running > vgscan on another. The vgscan will hang together with clvmd. Restarting > clvmd on the stopped node doesn't wake it up. > > Once hung, an strace shows 3 clvmd threads, 2 waiting on futexes and > one trying to read from /dev/misc/dlm_clvmd. All 3 threads wait > indefinitely on these system calls. Here's the last part of the strace: > > [pid 2951] select(1024, [4 6], NULL, NULL, {90, 0}) = 1 (in [4], left {56, 190000}) > [pid 2951] accept(4, {sa_family=AF_FILE, path=@}, [2]) = 5 > [pid 2951] ioctl(6, 0x7805, 0) = 1 > [pid 2951] select(1024, [4 5 6], NULL, NULL, {90, 0}) = 1 (in [5], left {90, 0}) > [pid 2951] read(5, "3\0\0\0\0\0\0\0\0\0\0\0\v\0\0\0\0\4\4P_global\0\0", 4096) = 29 > [pid 2951] futex(0x84d64f4, FUTEX_WAIT, 2, NULL > > P_global doesn't show up in /proc/cluster/dlm_locks at this point. > Here's what I can get from dlm_debug: > > clvmd rebuilt 5 resources > clvmd purge requests > clvmd purged 0 requests > clvmd mark waiting requests > clvmd marked 0 requests > clvmd purge locks of departed nodes > clvmd purged 0 locks > clvmd update remastered resources > clvmd updated 0 resources > clvmd rebuild locks > clvmd rebuilt 0 locks > clvmd recover event 22 done > clvmd move flags 0,0,1 ids 11,22,22 > clvmd process held requests > clvmd processed 0 requests > clvmd resend marked requests > clvmd resent 0 requests > clvmd recover event 22 finished > clvmd move flags 1,0,0 ids 22,22,22 > clvmd move flags 0,1,0 ids 22,23,22 > clvmd move use event 23 > clvmd recover event 23 > clvmd add node 1 > clvmd total nodes 3 > clvmd rebuild resource directory > clvmd rebuilt 5 resources > clvmd purge requests > clvmd purged 0 requests > clvmd mark waiting requests > clvmd marked 0 requests > clvmd recover event 23 done > clvmd move flags 0,0,1 ids 22,23,23 > clvmd process held requests > clvmd processed 0 requests > clvmd resend marked requests > clvmd resent 0 requests > clvmd recover event 23 finished > > I'm running 4.6 with kernel-hugemem-2.6.9-67.0.7.EL, > lvm2-cluster-2.02.27-2.el4_6.2 & dlm-kernel-hugemem-2.6.9-52.5. Has > anyone else seen anything like this? > Yes, we seem to have collected quite a few bugzillas on the subject! The fix is in CVS for LVM2. Packages are on their way I believe. -- Chrissie From tiagocruz at forumgdh.net Wed Apr 2 15:08:53 2008 From: tiagocruz at forumgdh.net (Tiago Cruz) Date: Wed, 02 Apr 2008 12:08:53 -0300 Subject: [Linux-cluster] Why my cluster stop to work when one node down? Message-ID: <1207148933.27447.6.camel@tuxkiller.ig.com.br> Hello guys, I have one cluster with two machines, running RHEL 5.1 x86_64. The Storage device has imported using GNDB and formated using GFS, to mount on both nodes: [root at teste-spo-la-v1 ~]# gnbd_import -v -l Device name : cluster ---------------------- Minor # : 0 sysfs name : /block/gnbd0 Server : gnbdserv Port : 14567 State : Open Connected Clear Readonly : No Sectors : 20971520 # gfs2_mkfs -p lock_dlm -t mycluster:export1 -j 2 /dev/gnbd/cluster # mount /dev/gnbd/cluster /mnt/ Everything works graceful, until one node get out (shutdown, network stop, xm destroy...) teste-spo-la-v1 clurgmgrd[3557]: #1: Quorum Dissolved Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering GATHER state from 0. Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Creating commit token because I am the rep. Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Saving state aru 46 high seq received 46 Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Storing new sequence id for ring 4c Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering COMMIT state. Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering RECOVERY state. Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] position [0] member 10.25.0.251: Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] previous ring seq 72 rep 10.25.0.251 Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] aru 46 high delivered 46 received flag 1 Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Did not need to originate any messages in recovery. Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Sending initial ORF token Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE Apr 2 12:00:07 teste-spo-la-v1 kernel: dlm: closing connection to node 3 Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration: Apr 2 12:00:07 teste-spo-la-v1 clurgmgrd[3557]: #1: Quorum Dissolved Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251) Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left: Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.252) Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined: Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CMAN ] quorum lost, blocking activity Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration: Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251) Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left: Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined: Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [SYNC ] This node is within the primary component and will provide service. Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering OPERATIONAL state. Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] got nodejoin message 10.25.0.251 Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CPG ] got joinlist message from node 2 Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused Apr 2 12:00:16 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. Apr 2 12:00:17 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused Apr 2 12:00:22 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. So then, my GFS mount point has broken... the terminal freeze when I try to access the directory "/mnt" and just come back when the second node has back again to the cluster. Follow the cluster.conf: Thanks! -- Tiago Cruz http://everlinux.com Linux User #282636 From gordan at bobich.net Wed Apr 2 15:16:16 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 2 Apr 2008 16:16:16 +0100 (BST) Subject: [Linux-cluster] Why my cluster stop to work when one node down? In-Reply-To: <1207148933.27447.6.camel@tuxkiller.ig.com.br> References: <1207148933.27447.6.camel@tuxkiller.ig.com.br> Message-ID: Replace: with in cluster.conf. Gordan On Wed, 2 Apr 2008, Tiago Cruz wrote: > Hello guys, > > I have one cluster with two machines, running RHEL 5.1 x86_64. > The Storage device has imported using GNDB and formated using GFS, to > mount on both nodes: > > [root at teste-spo-la-v1 ~]# gnbd_import -v -l > Device name : cluster > ---------------------- > Minor # : 0 > sysfs name : /block/gnbd0 > Server : gnbdserv > Port : 14567 > State : Open Connected Clear > Readonly : No > Sectors : 20971520 > > # gfs2_mkfs -p lock_dlm -t mycluster:export1 -j 2 /dev/gnbd/cluster > # mount /dev/gnbd/cluster /mnt/ > > Everything works graceful, until one node get out (shutdown, network > stop, xm destroy...) > > > teste-spo-la-v1 clurgmgrd[3557]: #1: Quorum Dissolved Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering GATHER state from 0. > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Creating commit token because I am the rep. > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Saving state aru 46 high seq received 46 > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Storing new sequence id for ring 4c > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering COMMIT state. > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering RECOVERY state. > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] position [0] member 10.25.0.251: > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] previous ring seq 72 rep 10.25.0.251 > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] aru 46 high delivered 46 received flag 1 > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Did not need to originate any messages in recovery. > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Sending initial ORF token > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE > Apr 2 12:00:07 teste-spo-la-v1 kernel: dlm: closing connection to node 3 > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration: > Apr 2 12:00:07 teste-spo-la-v1 clurgmgrd[3557]: #1: Quorum Dissolved > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251) > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left: > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.252) > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined: > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CMAN ] quorum lost, blocking activity > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration: > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251) > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left: > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined: > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [SYNC ] This node is within the primary component and will provide service. > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering OPERATIONAL state. > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] got nodejoin message 10.25.0.251 > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CPG ] got joinlist message from node 2 > Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. > Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused > Apr 2 12:00:16 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. > Apr 2 12:00:17 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused > Apr 2 12:00:22 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. > > > So then, my GFS mount point has broken... the terminal freeze when I try > to access the directory "/mnt" and just come back when the second node > has back again to the cluster. > > > Follow the cluster.conf: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > -- > Tiago Cruz > http://everlinux.com > Linux User #282636 > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From swhiteho at redhat.com Wed Apr 2 15:17:08 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 02 Apr 2008 16:17:08 +0100 Subject: [Linux-cluster] About GFS1 and I/O barriers. In-Reply-To: <1a2a6dd60804020726g20d77419k47298eb000c431ec@mail.gmail.com> References: <20080328153458.45fc6e13@mathieu.toulouse> <20080331124651.3f0d2428@mathieu.toulouse> <1206960860.3635.126.camel@quoit> <20080331151622.1360a2cb@mathieu.toulouse> <1207130014.3310.24.camel@localhost.localdomain> <1a2a6dd60804020726g20d77419k47298eb000c431ec@mail.gmail.com> Message-ID: <1207149428.3635.151.camel@quoit> Hi, On Wed, 2008-04-02 at 10:26 -0400, Wendy Cheng wrote: > > > On Wed, Apr 2, 2008 at 5:53 AM, Steven Whitehouse > wrote: > Hi, > > On Mon, 2008-03-31 at 15:16 +0200, Mathieu Avila wrote: > > Le Mon, 31 Mar 2008 11:54:20 +0100, > > Steven Whitehouse a ?crit : > > > > > Hi, > > > > > > > Hi, > > > > > Both GFS1 and GFS2 are safe from this problem since > neither of them > > > use barriers. Instead we do a flush at the critical points > to ensure > > > that all data is on disk before proceeding with the next > stage. > > > > > > > I don't think this solves the problem. > > > > Consider a cheap iSCSI disk (no NVRAM, no UPS) accessed by > all my GFS > > nodes; this disk has a write cache enabled, which means it > will reply > > that write requests are performed even if they are not > really written > > on the platters. The disk (like most disks nowadays) has > some logic > > that allows it to optimize writes by re-scheduling them. It > is possible > > that all writes are ACK'd before the power failure, but only > a fraction > > of them were really performed : some are before the flush, > some are > > after the flush. > > --Not all blocks writes before the flush were performed but > other > > blocks after the flush are written -> the FS is corrupted.-- > > So, after the power failure all data in the disk's write > cache are > > forgotten. If the journal data was in the disk cache, the > journal was > > not written to disk, but other metadata have been written, > so there are > > metadata inconsistencies. > > > > I don't agree that write caching implies that I/O must be > acked before > it has hit disk. It might well be reordered (which is ok), but > if we > wait for all outstanding I/O completions, then we ought to be > able to be > sure that all I/O is actually on disk, or at the very least > that further > I/O will not be reordered with already ACKed data. If devices > are > sending ACKs in advance of the I/O hitting disk then I think > thats > broken behaviour. > > You seem to assume when disk subsystem acks back, the data is surely > on disk. That is not correct . You may consider it a brokoen behavior, > mostly from firmware bugs, but it occurs more often than you would > expect. The problem is extremely difficult to debug from host side. So > I think the proposal here is how the filesystem should protect itself > from this situation (though I'm fuzzy about what the actual proposal > is without looking into other subsystems, particularly volume manager, > that are involved) You can not say "oh, then I don't have the > responsibility. Please go to talk to disk vendors". Serious > implementations have been trying to find good ways to solve this > issue. > > -- Wendy > If the data is not physically on disk when the ACK it sent back, then there is no way for the fs to know whether the data has (at a later date) not been written due to some error or other. Even ignoring that for the moment and assuming that such errors never occur, I don't think its too unreasonable to expect at a minimum that all acknowledged I/O will never be reordered with unacknowledged I/O. That is all that is required for correct operation of gfs1/2 provided that no media errors occur on write. The message on lkml which Mathieu referred to suggested that there were three kinds of devices, but it seems to be that type 2 (flushable) doesn't exist so far as the fs is concerned since blkdev_issue_flush() just issues a BIO with only a barrier in it. A device driver might support the barrier request by either waiting for all outstanding I/O and issuing a flush command (if required) or by passing the barrier down to the device, assuming that it supports such a thing directly. Further down the message (the url is http://lkml.org/lkml/2007/5/25/71 btw) there is a list of dm/md implementation status and it seems that for a good number of the common targets there is little or no support for barriers anyway at the moment. Now I agree that it would be nice to support barriers in GFS2, but it won't solve any problems relating to ordering of I/O unless all of the underlying device supports them too. See also Alasdair's response to the thread: http://lkml.org/lkml/2007/5/28/81 So although I'd like to see barrier support in GFS2, it won't solve any problems for most people and really its a device/block layer issue at the moment. Steve. From siddiqut at gmail.com Wed Apr 2 15:30:31 2008 From: siddiqut at gmail.com (Tajdar Siddiqui) Date: Wed, 2 Apr 2008 11:30:31 -0400 Subject: [Linux-cluster] writing to GFS from multiple JVM's concurrently Message-ID: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> Hi, We are evaluating GFS for use as a highly concurrent distributed file system. What I have observed: When 2 JVM's (multiple Threads per Java Virtual Machine) are writing to the same directory on GFS, on of the JVM doesn't see the files it writes on the GFS. The Writer Threads on JVM think they're done, but the files don't show up on "ls" etc. The other JVM works fine. This problem goes away if the 2 JVM's write to different directories on GFS OR Only one JVM is writing at a time. Any ideas on this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Wed Apr 2 15:38:50 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 2 Apr 2008 16:38:50 +0100 (BST) Subject: [Linux-cluster] writing to GFS from multiple JVM's concurrently In-Reply-To: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> References: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> Message-ID: On Wed, 2 Apr 2008, Tajdar Siddiqui wrote: > When 2 JVM's? (multiple Threads per Java Virtual Machine) are writing to the same directory > on GFS, on of the JVM doesn't see the files it writes on the GFS. > The Writer Threads on JVM think they're done, but the files don't show up on "ls" etc. > The other JVM works fine. > > This problem goes away if the 2 JVM's write to different directories on GFS > > OR > > Only one JVM is writing at a time. > > Any ideas on this. This may sound like a daft question, but did you test it on ext3? Are the JVMs on the same node? What locking protocol are you using? Gordan From cluster at defuturo.co.uk Wed Apr 2 15:48:06 2008 From: cluster at defuturo.co.uk (Robert Clark) Date: Wed, 02 Apr 2008 16:48:06 +0100 Subject: [Linux-cluster] clvmd hang In-Reply-To: <47F39B8A.3040107@redhat.com> References: <1207144978.2373.34.camel@rutabaga.defuturo.co.uk> <47F39B8A.3040107@redhat.com> Message-ID: <1207151286.2373.45.camel@rutabaga.defuturo.co.uk> On Wed, 2008-04-02 at 15:43 +0100, Christine Caulfield wrote: > > Has anyone else seen anything like this? > Yes, we seem to have collected quite a few bugzillas on the subject! The > fix is in CVS for LVM2. Packages are on their way I believe. Ah yes. I searched BZ for dlm bugs but forgot to check for lvm2-cluster ones... I'll test again after bz#435491 is closed. Thanks, Robert From s.wendy.cheng at gmail.com Wed Apr 2 15:57:44 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 2 Apr 2008 11:57:44 -0400 Subject: [Linux-cluster] About GFS1 and I/O barriers. In-Reply-To: <1207149428.3635.151.camel@quoit> References: <20080328153458.45fc6e13@mathieu.toulouse> <20080331124651.3f0d2428@mathieu.toulouse> <1206960860.3635.126.camel@quoit> <20080331151622.1360a2cb@mathieu.toulouse> <1207130014.3310.24.camel@localhost.localdomain> <1a2a6dd60804020726g20d77419k47298eb000c431ec@mail.gmail.com> <1207149428.3635.151.camel@quoit> Message-ID: <1a2a6dd60804020857w16ddeaebs41f916f4a01792c3@mail.gmail.com> On Wed, Apr 2, 2008 at 11:17 AM, Steven Whitehouse wrote: > > Now I agree that it would be nice to support barriers in GFS2, but it > won't solve any problems relating to ordering of I/O unless all of the > underlying device supports them too. See also Alasdair's response to the > thread: http://lkml.org/lkml/2007/5/28/81 I'm not suggesting GFS1/2 should take this patch, considering their current states. However, you can't give people an impression, as your original reply implying, that GFS1/2 would not have this problem. > > So although I'd like to see barrier support in GFS2, it won't solve any > problems for most people and really its a device/block layer issue at > the moment. This part I agree ... better to attack this issue from volume manager than from filesystem. -- Wendy -------------- next part -------------- An HTML attachment was scrubbed... URL: From tiagocruz at forumgdh.net Wed Apr 2 15:59:37 2008 From: tiagocruz at forumgdh.net (Tiago Cruz) Date: Wed, 02 Apr 2008 12:59:37 -0300 Subject: [Linux-cluster] Why my cluster stop to work when one node down? In-Reply-To: References: <1207148933.27447.6.camel@tuxkiller.ig.com.br> Message-ID: <1207151977.27447.10.camel@tuxkiller.ig.com.br> Nice ?Gordan!!! It works now!! :-p "?Quorum" its the number minimum of nodes on the cluster? [root at teste-spo-la-v1 ~]# cman_tool status Version: 6.0.1 Config Version: 3 Cluster Name: mycluster Cluster Id: 56756 Cluster Member: Yes Cluster Generation: 140 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Quorum: 1 Active subsystems: 8 Flags: 2node Ports Bound: 0 11 177 Node name: node1.mycluster.com Node ID: 1 Multicast addresses: 239.192.221.146 Node addresses: 10.25.0.251 ?Many thanks!! On Wed, 2008-04-02 at 16:16 +0100, gordan at bobich.net wrote: > Replace: > > > > > with > > > > in cluster.conf. > > Gordan > > On Wed, 2 Apr 2008, Tiago Cruz wrote: > > > Hello guys, > > > > I have one cluster with two machines, running RHEL 5.1 x86_64. > > The Storage device has imported using GNDB and formated using GFS, to > > mount on both nodes: > > > > [root at teste-spo-la-v1 ~]# gnbd_import -v -l > > Device name : cluster > > ---------------------- > > Minor # : 0 > > sysfs name : /block/gnbd0 > > Server : gnbdserv > > Port : 14567 > > State : Open Connected Clear > > Readonly : No > > Sectors : 20971520 > > > > # gfs2_mkfs -p lock_dlm -t mycluster:export1 -j 2 /dev/gnbd/cluster > > # mount /dev/gnbd/cluster /mnt/ > > > > Everything works graceful, until one node get out (shutdown, network > > stop, xm destroy...) > > > > > > teste-spo-la-v1 clurgmgrd[3557]: #1: Quorum Dissolved Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering GATHER state from 0. > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Creating commit token because I am the rep. > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Saving state aru 46 high seq received 46 > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Storing new sequence id for ring 4c > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering COMMIT state. > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering RECOVERY state. > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] position [0] member 10.25.0.251: > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] previous ring seq 72 rep 10.25.0.251 > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] aru 46 high delivered 46 received flag 1 > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Did not need to originate any messages in recovery. > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Sending initial ORF token > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE > > Apr 2 12:00:07 teste-spo-la-v1 kernel: dlm: closing connection to node 3 > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration: > > Apr 2 12:00:07 teste-spo-la-v1 clurgmgrd[3557]: #1: Quorum Dissolved > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251) > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left: > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.252) > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined: > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CMAN ] quorum lost, blocking activity > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] CLM CONFIGURATION CHANGE > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] New Configuration: > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] r(0) ip(10.25.0.251) > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Left: > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] Members Joined: > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [SYNC ] This node is within the primary component and will provide service. > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering OPERATIONAL state. > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM ] got nodejoin message 10.25.0.251 > > Apr 2 12:00:07 teste-spo-la-v1 openais[1545]: [CPG ] got joinlist message from node 2 > > Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. > > Apr 2 12:00:12 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused > > Apr 2 12:00:16 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. > > Apr 2 12:00:17 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused > > Apr 2 12:00:22 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate. Refusing connection. > > > > > > So then, my GFS mount point has broken... the terminal freeze when I try > > to access the directory "/mnt" and just come back when the second node > > has back again to the cluster. > > > > > > Follow the cluster.conf: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > -- > > Tiago Cruz > > http://everlinux.com > > Linux User #282636 > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Tiago Cruz http://everlinux.com Linux User #282636 From gordan at bobich.net Wed Apr 2 16:13:16 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 2 Apr 2008 17:13:16 +0100 (BST) Subject: [Linux-cluster] Why my cluster stop to work when one node down? In-Reply-To: <1207151977.27447.10.camel@tuxkiller.ig.com.br> References: <1207148933.27447.6.camel@tuxkiller.ig.com.br> <1207151977.27447.10.camel@tuxkiller.ig.com.br> Message-ID: > Nice ?Gordan!!! > > It works now!! :-p You're welcome. :) > "?Quorum" its the number minimum of nodes on the cluster? Yes, it's the minimum number of nodes required for the cluster to start. This is (n+1)/2, round up number of nodes defined in cluster.conf. This ensures that the cluster can't split-brain. In the 2-node case this needs to be adjusted which is what the two_node parameter does. There's higher risk of splitbrain, though, but you can use tie-breakers of some sort. Gordan From rohara at redhat.com Wed Apr 2 16:23:43 2008 From: rohara at redhat.com (Ryan O'Hara) Date: Wed, 02 Apr 2008 11:23:43 -0500 Subject: [Linux-cluster] SCSI reservation conflicts after update In-Reply-To: <47ED9556.9050000@amnh.org> References: <47ED9556.9050000@amnh.org> Message-ID: <47F3B30F.2050502@redhat.com> I went back and investigated why this might happen. Seems that I had seen it before but could not recall how this sort of thing happens. For 4.6, the scsi_reserve script should only be run if you intend to use SCSI reservations as a fence mechanism, as you correctly pointed out at the end of your message. I believe in 4.6 scsi_reserve was incorrectly enabled by default. The real problem is that the keys used for scsi reservations are based on node ID. For this reason, it is required that nodeid be defined in the cluster.conf file for all nodes. Without this, the nodeid can change from node to node between cluster restarts, etc. The scsi_reserve and fence_scsi scripts require consistent nodeid (ie. they do not change). So I think the problem we are seeing is that running 'scsi_reserve stop' cannot work since that will attempt to remove that node's key from the devices. If that key has changed (the node ID changed), it will not find a matching registration key on the device and thus fail. The best bet is to disable scsi_reserve and to clear all scsi reservations. As you mentioned, the sg_persist command with the -C option should do the trick. I am guessing that the reason that failed for you is that you must supply the device name AND the key being used for that I_T nexus. You can use sg_persist to list the keys registered with a particular device, but since nodeid's may have changed you might have to guess the key for a particular node (ie. the node you run the sg_persist -C command on). The good news is that when you identify the correct key it will clear all the keys. Ryan Sajesh Singh wrote: > After updating my GFS cluster to the latest packages (as of 3/28/08) on > an Enterprise Linux 4.6 cluster (kernel version 2.6.9-67.0.7.ELsmp) I > am receiving scsi reservation errors whenever the nodes are rebooted. > The node is then subsequently rebooted at varying intervals without any > intervention. I have tried to disable the scsi_reserve script from > startup, but it does not seem to have any effect. I have also tried to > use the sg_persist command to clear all reservations with the -C option > to no avail. I first noticed something was wrong when the 2nd node of > the 2 node cluster was being updated. That was the first sign of the > scsi reservation errors on the console. > > From my understanding persistent SCSI reservations are only needed if I > am using the fence_scsi module. > > I would appreciate any guidance. > > Regards, > > Sajesh Singh > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From ssingh at amnh.org Wed Apr 2 16:36:11 2008 From: ssingh at amnh.org (Sajesh Singh) Date: Wed, 02 Apr 2008 12:36:11 -0400 Subject: [Linux-cluster] SCSI reservation conflicts after update In-Reply-To: <47F3B30F.2050502@redhat.com> References: <47ED9556.9050000@amnh.org> <47F3B30F.2050502@redhat.com> Message-ID: <47F3B5FB.3090107@amnh.org> Ryan and all else that have answered, Thank you for the info on scsi_reserve. I have disabled the script and all seems okay. What is a little confusing is that the script/service was enabled before the upgrade, but did not cause any scsi reservation conflicts. -Sajesh- Ryan O'Hara wrote: > > I went back and investigated why this might happen. Seems that I had > seen it before but could not recall how this sort of thing happens. > > For 4.6, the scsi_reserve script should only be run if you intend to > use SCSI reservations as a fence mechanism, as you correctly pointed > out at the end of your message. I believe in 4.6 scsi_reserve was > incorrectly enabled by default. > > The real problem is that the keys used for scsi reservations are based > on node ID. For this reason, it is required that nodeid be defined in > the cluster.conf file for all nodes. Without this, the nodeid can > change from node to node between cluster restarts, etc. The > scsi_reserve and fence_scsi scripts require consistent nodeid (ie. > they do not change). > > So I think the problem we are seeing is that running 'scsi_reserve > stop' cannot work since that will attempt to remove that node's key > from the devices. If that key has changed (the node ID changed), it > will not find a matching registration key on the device and thus fail. > > The best bet is to disable scsi_reserve and to clear all scsi > reservations. As you mentioned, the sg_persist command with the -C > option should do the trick. I am guessing that the reason that failed > for you is that you must supply the device name AND the key being used > for that I_T nexus. You can use sg_persist to list the keys registered > with a particular device, but since nodeid's may have changed you > might have to guess the key for a particular node (ie. the node you > run the sg_persist -C command on). The good news is that when you > identify the correct key it will clear all the keys. > > Ryan > > Sajesh Singh wrote: >> After updating my GFS cluster to the latest packages (as of 3/28/08) >> on an Enterprise Linux 4.6 cluster (kernel version >> 2.6.9-67.0.7.ELsmp) I am receiving scsi reservation errors whenever >> the nodes are rebooted. The node is then subsequently rebooted at >> varying intervals without any intervention. I have tried to disable >> the scsi_reserve script from startup, but it does not seem to have >> any effect. I have also tried to use the sg_persist command to clear >> all reservations with the -C option to no avail. I first noticed >> something was wrong when the 2nd node of the 2 node cluster was being >> updated. That was the first sign of the scsi reservation errors on >> the console. >> >> From my understanding persistent SCSI reservations are only needed >> if I am using the fence_scsi module. >> >> I would appreciate any guidance. >> >> Regards, >> >> Sajesh Singh >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > From RJM002 at shsu.edu Wed Apr 2 16:37:04 2008 From: RJM002 at shsu.edu (MARTI, ROBERT JESSE) Date: Wed, 2 Apr 2008 11:37:04 -0500 Subject: [Linux-cluster] Why my cluster stop to work when one node down? In-Reply-To: References: <1207148933.27447.6.camel@tuxkiller.ig.com.br><1207151977.27447.10.camel@tuxkiller.ig.com.br> Message-ID: <9F633DE6C0E04F4691DCB713AC44C94B066E4C68@EXCHANGE.SHSU.EDU> Speaking of... If I already have a cluster set up that split brained itself (but the services are still running on one, and it wont un-split brain with the other box up...) how hard would it be to add a quorum disk? I guess I could post my whole problem and let smarter people figure out what I broke. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of gordan at bobich.net Sent: Wednesday, April 02, 2008 11:13 AM To: linux clustering Subject: Re: [Linux-cluster] Why my cluster stop to work when one node down? > Nice ?Gordan!!! > > It works now!! :-p You're welcome. :) > "?Quorum" its the number minimum of nodes on the cluster? Yes, it's the minimum number of nodes required for the cluster to start. This is (n+1)/2, round up number of nodes defined in cluster.conf. This ensures that the cluster can't split-brain. In the 2-node case this needs to be adjusted which is what the two_node parameter does. There's higher risk of splitbrain, though, but you can use tie-breakers of some sort. Gordan From paolom at prisma-eng.it Wed Apr 2 17:20:58 2008 From: paolom at prisma-eng.it (Paolo Marini) Date: Wed, 02 Apr 2008 19:20:58 +0200 Subject: [Linux-cluster] Problems with SAMBA server on Centos 51 virtual xen guest with iSCSI SAN Message-ID: <47F3C07A.8090709@prisma-eng.it> I have implemented a cluster of a few xen guest with a shared GFS filesystem residing on a SAN build with openfiler to support iSCSI storage. Physical servers are 3 machines implementing a physical cluster, each one equipped with quad xeon and 4 G RAM. The network interface is based on channel bonding with LACP (on the physical hosts) having an aggregate of 2 gigabits ethernet per physical host, the switch supports LACP and has been configured accordingly. Virtual servers are based on xen nodes on top of the physical server with shared storage on iSCSI and GFS. The networking is based on a cluster private network (for cluster heartbeat and cluster communication + iSCSI) and an ethernet alias for the LAN to which the users are connected. One of the cluster xen nodes is used for implementing a samba PDC (no failover of the service, plain samba, single samba server on the LAN) plus ldap server; samba works with ldap for users authentication. Storage for the samba server is on the SAN. I continue to receive complaints from my users due to the fact that sometimes copying file generates errors, plus problems related to office usage (we still use the old Office 97 on some machines). The samba configuration is more or less the same as that correctly working on the previous physical machine, on which those problems were not present. The problems generate these log entries on /var/log/samba/smbd: [2008/04/02 19:00:50, 0] lib/util_sock.c:get_peer_addr(1232) getpeername failed. Error was Transport endpoint is not connected [2008/04/02 19:05:32, 0] lib/util_sock.c:get_peer_addr(1232) getpeername failed. Error was Transport endpoint is not connected [2008/04/02 19:05:32, 0] lib/util_sock.c:get_peer_addr(1232) getpeername failed. Error was Transport endpoint is not connected And on the client machine log also on /var/log/samba [2008/04/02 19:04:34, 0] lib/util_sock.c:read_data(534) read_data: read failure for 4 bytes to client 192.168.13.240. Error = Connection reset by peer [2008/04/02 19:04:34, 1] smbd/service.c:close_cnum(1230) amhwq53p (192.168.13.240) closed connection to service tmp [2008/04/02 19:04:34, 1] smbd/service.c:close_cnum(1230) amhwq53p (192.168.13.240) closed connection to service stock [2008/04/02 19:04:34, 0] lib/util_sock.c:write_data(562) write_data: write failure in writing to client 192.168.13.240. Error Broken pipe [2008/04/02 19:04:34, 0] lib/util_sock.c:send_smb(769) Error writing 75 bytes to client. -1. (Broken pipe) [2008/04/02 19:04:34, 1] smbd/service.c:make_connection_snum(1033) They seem similar to problems related to poor connectivity or problem in the network; however, these problems are new and were never found before switching to the clustered architecture. Also no problem have been found so far on the other xen nodes serving the same GFS filesystem (different dirs !) for NFS or other services. Also putting the option posix locking = no on the smb.conf file did not help. Any idea from someone else facing the same problems ? thanks, Paolo From jamesc at exa.com Wed Apr 2 18:50:13 2008 From: jamesc at exa.com (James Chamberlain) Date: Wed, 2 Apr 2008 14:50:13 -0400 Subject: [Linux-cluster] Unformatting a GFS cluster disk In-Reply-To: <1206482065.2741.83.camel@technetium.msp.redhat.com> References: <1206482065.2741.83.camel@technetium.msp.redhat.com> Message-ID: <0018D44C-35B7-4496-88F6-C8AB8F12FA84@exa.com> On Mar 25, 2008, at 5:54 PM, Bob Peterson wrote: > If it were my file system, and I didn't have a backup, and I had > data on it that I absolutely needed to get back, I personally would > use the gfs2_edit tool (assuming RHEL5, Centos5 or similar) which can > mostly operate on gfs1 file systems. The "gfs_edit" tool will also > work, but it is much more primitive than gfs2_edit (but at least it > exists on RHEL4, Centos4 and similar). Any idea what RPM gfs_edit would be in for RHEL4/CentOS 4? I've got CentOS 4.6, and I'm not finding it anywhere. Thanks, James From siddiqut at gmail.com Wed Apr 2 19:34:15 2008 From: siddiqut at gmail.com (Tajdar Siddiqui) Date: Wed, 2 Apr 2008 15:34:15 -0400 Subject: [Linux-cluster] Re: writing to GFS from multiple JVM's concurrently In-Reply-To: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> References: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> Message-ID: <3abaa1ce0804021234v6eb5b425hd591f6cf2c5f3caa@mail.gmail.com> Hi Gordon, Thanx for your reply. Yes, this test works fine on an ext3 filesystem. The JVM's are on different nodes. The files being written/read on the 2 JVM's are different (file-names). Where does locking come into play here ? A JVM is only reading the files it creates, so there is no cross. -Tajdar -------------- next part -------------- An HTML attachment was scrubbed... URL: From jruemker at redhat.com Wed Apr 2 20:10:23 2008 From: jruemker at redhat.com (John Ruemker) Date: Wed, 02 Apr 2008 16:10:23 -0400 Subject: [Linux-cluster] Problems with SAMBA server on Centos 51 virtual xen guest with iSCSI SAN In-Reply-To: <47F3C07A.8090709@prisma-eng.it> References: <47F3C07A.8090709@prisma-eng.it> Message-ID: <47F3E82F.6030600@redhat.com> Paolo Marini wrote: > I have implemented a cluster of a few xen guest with a shared GFS > filesystem residing on a SAN build with openfiler to support iSCSI > storage. > > Physical servers are 3 machines implementing a physical cluster, each > one equipped with quad xeon and 4 G RAM. The network interface is > based on channel bonding with LACP (on the physical hosts) having an > aggregate of 2 gigabits ethernet per physical host, the switch > supports LACP and has been configured accordingly. > > Virtual servers are based on xen nodes on top of the physical server > with shared storage on iSCSI and GFS. > > The networking is based on a cluster private network (for cluster > heartbeat and cluster communication + iSCSI) and an ethernet alias for > the LAN to which the users are connected. > > One of the cluster xen nodes is used for implementing a samba PDC (no > failover of the service, plain samba, single samba server on the LAN) > plus ldap server; samba works with ldap for users authentication. > Storage for the samba server is on the SAN. > > I continue to receive complaints from my users due to the fact that > sometimes copying file generates errors, plus problems related to > office usage (we still use the old Office 97 on some machines). The > samba configuration is more or less the same as that correctly working > on the previous physical machine, on which those problems were not > present. > > The problems generate these log entries on /var/log/samba/smbd: > > [2008/04/02 19:00:50, 0] lib/util_sock.c:get_peer_addr(1232) > getpeername failed. Error was Transport endpoint is not connected > [2008/04/02 19:05:32, 0] lib/util_sock.c:get_peer_addr(1232) > getpeername failed. Error was Transport endpoint is not connected > [2008/04/02 19:05:32, 0] lib/util_sock.c:get_peer_addr(1232) > getpeername failed. Error was Transport endpoint is not connected > > And on the client machine log also on /var/log/samba > > [2008/04/02 19:04:34, 0] lib/util_sock.c:read_data(534) > read_data: read failure for 4 bytes to client 192.168.13.240. Error = > Connection reset by peer > [2008/04/02 19:04:34, 1] smbd/service.c:close_cnum(1230) > amhwq53p (192.168.13.240) closed connection to service tmp > [2008/04/02 19:04:34, 1] smbd/service.c:close_cnum(1230) > amhwq53p (192.168.13.240) closed connection to service stock > [2008/04/02 19:04:34, 0] lib/util_sock.c:write_data(562) > write_data: write failure in writing to client 192.168.13.240. Error > Broken pipe > [2008/04/02 19:04:34, 0] lib/util_sock.c:send_smb(769) > Error writing 75 bytes to client. -1. (Broken pipe) > [2008/04/02 19:04:34, 1] smbd/service.c:make_connection_snum(1033) > > They seem similar to problems related to poor connectivity or problem > in the network; however, these problems are new and were never found > before switching to the clustered architecture. Also no problem have > been found so far on the other xen nodes serving the same GFS > filesystem (different dirs !) for NFS or other services. > > Also putting the option > > posix locking = no > > on the smb.conf file did not help. > > Any idea from someone else facing the same problems ? > > thanks, Paolo > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Those errors are explained in http://kbase.redhat.com/faq/FAQ_45_5274.shtm John From gordan at bobich.net Wed Apr 2 21:19:11 2008 From: gordan at bobich.net (Gordan Bobic) Date: Wed, 02 Apr 2008 22:19:11 +0100 Subject: [Linux-cluster] Re: writing to GFS from multiple JVM's concurrently In-Reply-To: <3abaa1ce0804021234v6eb5b425hd591f6cf2c5f3caa@mail.gmail.com> References: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> <3abaa1ce0804021234v6eb5b425hd591f6cf2c5f3caa@mail.gmail.com> Message-ID: <47F3F84F.5020309@bobich.net> Tajdar Siddiqui wrote: > Yes, this test works fine on an ext3 filesystem. > > The JVM's are on different nodes. > > The files being written/read on the 2 JVM's are different (file-names). > Where does locking come into play here ? > > A JVM is only reading the files it creates, so there is no cross. Writing files to a directory requires a directory lock. This lock needs to be bounced back between the nodes. This will slow things down at the very least. If you can arrange your files/application so that the nodes are writing to separate directory trees, then that will undoubtedly give you better performance. What version of GFS are you using? If GFS2, try GFS1. GFS2 isn't entirely stable yet. Gordan From jruemker at redhat.com Wed Apr 2 20:38:30 2008 From: jruemker at redhat.com (John Ruemker) Date: Wed, 02 Apr 2008 16:38:30 -0400 Subject: [Linux-cluster] Unformatting a GFS cluster disk In-Reply-To: <0018D44C-35B7-4496-88F6-C8AB8F12FA84@exa.com> References: <1206482065.2741.83.camel@technetium.msp.redhat.com> <0018D44C-35B7-4496-88F6-C8AB8F12FA84@exa.com> Message-ID: <47F3EEC6.4010105@redhat.com> James Chamberlain wrote: > > On Mar 25, 2008, at 5:54 PM, Bob Peterson wrote: > >> If it were my file system, and I didn't have a backup, and I had >> data on it that I absolutely needed to get back, I personally would >> use the gfs2_edit tool (assuming RHEL5, Centos5 or similar) which can >> mostly operate on gfs1 file systems. The "gfs_edit" tool will also >> work, but it is much more primitive than gfs2_edit (but at least it >> exists on RHEL4, Centos4 and similar). > > Any idea what RPM gfs_edit would be in for RHEL4/CentOS 4? I've got > CentOS 4.6, and I'm not finding it anywhere. > AFAIK its not provided in RHEL4. In RHEL5 it would be in gfs-utils package. John > Thanks, > > James > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From siddiqut at gmail.com Wed Apr 2 20:58:45 2008 From: siddiqut at gmail.com (Tajdar Siddiqui) Date: Wed, 2 Apr 2008 16:58:45 -0400 Subject: [Linux-cluster] Re: writing to GFS from multiple JVM's concurrently In-Reply-To: <3abaa1ce0804021234v6eb5b425hd591f6cf2c5f3caa@mail.gmail.com> References: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> <3abaa1ce0804021234v6eb5b425hd591f6cf2c5f3caa@mail.gmail.com> Message-ID: <3abaa1ce0804021358m56553dbfs9dc309d4c8bb32bd@mail.gmail.com> Hi Gordan (apologize i misspelled your name last time), Thanx for your help so far. A lame question probably: How do i figure out the gfs version: $ rpm -qa | grep GFS --returns nothing $ rpm -qa | grep gfs gfs2-utils-0.1.38-1.el5 kmod-gfs-0.1.19-7.el5_1.1 kmod-gfs-0.1.16-5.2.6.18_8.el5 gfs-utils-0.1.12-1.el5 Not sure how to figure it out. -Tajdar -------------- next part -------------- An HTML attachment was scrubbed... URL: From garromo at us.ibm.com Wed Apr 2 21:17:55 2008 From: garromo at us.ibm.com (Gary Romo) Date: Wed, 2 Apr 2008 15:17:55 -0600 Subject: [Linux-cluster] SCSI reservation conflicts after update In-Reply-To: <47F3B30F.2050502@redhat.com> Message-ID: We had a similar issue and we just removed sg3utils (orsomething like that), if your not going to use it. Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com "Ryan O'Hara" Sent by: linux-cluster-bounces at redhat.com 04/02/2008 10:23 AM Please respond to linux clustering To ssingh at amnh.org, linux clustering cc Subject Re: [Linux-cluster] SCSI reservation conflicts after update I went back and investigated why this might happen. Seems that I had seen it before but could not recall how this sort of thing happens. For 4.6, the scsi_reserve script should only be run if you intend to use SCSI reservations as a fence mechanism, as you correctly pointed out at the end of your message. I believe in 4.6 scsi_reserve was incorrectly enabled by default. The real problem is that the keys used for scsi reservations are based on node ID. For this reason, it is required that nodeid be defined in the cluster.conf file for all nodes. Without this, the nodeid can change from node to node between cluster restarts, etc. The scsi_reserve and fence_scsi scripts require consistent nodeid (ie. they do not change). So I think the problem we are seeing is that running 'scsi_reserve stop' cannot work since that will attempt to remove that node's key from the devices. If that key has changed (the node ID changed), it will not find a matching registration key on the device and thus fail. The best bet is to disable scsi_reserve and to clear all scsi reservations. As you mentioned, the sg_persist command with the -C option should do the trick. I am guessing that the reason that failed for you is that you must supply the device name AND the key being used for that I_T nexus. You can use sg_persist to list the keys registered with a particular device, but since nodeid's may have changed you might have to guess the key for a particular node (ie. the node you run the sg_persist -C command on). The good news is that when you identify the correct key it will clear all the keys. Ryan Sajesh Singh wrote: > After updating my GFS cluster to the latest packages (as of 3/28/08) on > an Enterprise Linux 4.6 cluster (kernel version 2.6.9-67.0.7.ELsmp) I > am receiving scsi reservation errors whenever the nodes are rebooted. > The node is then subsequently rebooted at varying intervals without any > intervention. I have tried to disable the scsi_reserve script from > startup, but it does not seem to have any effect. I have also tried to > use the sg_persist command to clear all reservations with the -C option > to no avail. I first noticed something was wrong when the 2nd node of > the 2 node cluster was being updated. That was the first sign of the > scsi reservation errors on the console. > > From my understanding persistent SCSI reservations are only needed if I > am using the fence_scsi module. > > I would appreciate any guidance. > > Regards, > > Sajesh Singh > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Wed Apr 2 21:40:19 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Wed, 02 Apr 2008 16:40:19 -0500 Subject: [Linux-cluster] Unformatting a GFS cluster disk In-Reply-To: <47F3EEC6.4010105@redhat.com> References: <1206482065.2741.83.camel@technetium.msp.redhat.com> <0018D44C-35B7-4496-88F6-C8AB8F12FA84@exa.com> <47F3EEC6.4010105@redhat.com> Message-ID: <1207172419.24927.37.camel@technetium.msp.redhat.com> On Wed, 2008-04-02 at 16:38 -0400, John Ruemker wrote: > James Chamberlain wrote: > > > > On Mar 25, 2008, at 5:54 PM, Bob Peterson wrote: > > > >> If it were my file system, and I didn't have a backup, and I had > >> data on it that I absolutely needed to get back, I personally would > >> use the gfs2_edit tool (assuming RHEL5, Centos5 or similar) which can > >> mostly operate on gfs1 file systems. The "gfs_edit" tool will also > >> work, but it is much more primitive than gfs2_edit (but at least it > >> exists on RHEL4, Centos4 and similar). > > > > Any idea what RPM gfs_edit would be in for RHEL4/CentOS 4? I've got > > CentOS 4.6, and I'm not finding it anywhere. > > > AFAIK its not provided in RHEL4. In RHEL5 it would be in gfs-utils > package. > > John > > > Thanks, > > > > James That's true, but I very recently (last week) built a gfs2_edit program that runs on RHEL4.6. (Yes, I know gfs2 won't run on 4.X but the gfs2_edit tool can still be used to work on gfs1 file systems.) I'm trying to get a people page worked out, and if that goes through, I'll post it there. I can send a tar/zip version of the source tree for this if anyone wants it. It's basically the same cluster source tree as 5.2, but I've modified it slightly so that it will compile on 4.6 without too much hassle. I can also easily send a x86_64 binary for RHEL4.6 gfs2_edit if that helps. A 32-bit version would be hard for me to build at the moment. The original gfs_edit is pretty primitive compared to gfs2_edit. Regards, Bob Peterson Red Hat Clustering & GFS From gordan at bobich.net Thu Apr 3 00:42:10 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 03 Apr 2008 01:42:10 +0100 Subject: [Linux-cluster] Re: writing to GFS from multiple JVM's concurrently In-Reply-To: <3abaa1ce0804021358m56553dbfs9dc309d4c8bb32bd@mail.gmail.com> References: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> <3abaa1ce0804021234v6eb5b425hd591f6cf2c5f3caa@mail.gmail.com> <3abaa1ce0804021358m56553dbfs9dc309d4c8bb32bd@mail.gmail.com> Message-ID: <47F427E2.50706@bobich.net> Tajdar Siddiqui wrote: > Thanx for your help so far. A lame question probably: How do i figure > out the gfs version: > > $ rpm -qa | grep gfs > gfs2-utils-0.1.38-1.el5 > kmod-gfs-0.1.19-7.el5_1.1 > kmod-gfs-0.1.16-5.2.6.18_8.el5 > gfs-utils-0.1.12-1.el5 > > Not sure how to figure it out. Did you make the FS with mkfs.gfs or mkfs.gfs2? What does mount say for the FS type? Gordan From siddiqut at gmail.com Thu Apr 3 01:10:20 2008 From: siddiqut at gmail.com (Tajdar Siddiqui) Date: Wed, 2 Apr 2008 21:10:20 -0400 Subject: [Linux-cluster] Re: writing to GFS from multiple JVM's concurrently In-Reply-To: <3abaa1ce0804021358m56553dbfs9dc309d4c8bb32bd@mail.gmail.com> References: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> <3abaa1ce0804021234v6eb5b425hd591f6cf2c5f3caa@mail.gmail.com> <3abaa1ce0804021358m56553dbfs9dc309d4c8bb32bd@mail.gmail.com> Message-ID: <3abaa1ce0804021810q64f6d0afg1126a9b981876bc1@mail.gmail.com> Unfortunately, I did not create this FS so not sure what command params were used. Output of df -T : $ df -T /gfs Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/mapper/vggfs01-lvol00 gfs 104551424 8120236 96431188 8% /gfs Output of mount: $ mount /dev/mapper/vggfs01-lvol00 on /gfs type gfs (rw,hostdata=jid=1:id=196610:first=0) My guess this is GFS1? Thanx, Tajdar -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmshehzad at yahoo.com Thu Apr 3 08:05:40 2008 From: pmshehzad at yahoo.com (Mshehzad Pankhawala) Date: Thu, 3 Apr 2008 01:05:40 -0700 (PDT) Subject: [Linux-cluster] How to use manual fencing in Redhat Cluster Suit Message-ID: <468783.25520.qm@web45809.mail.sp1.yahoo.com> We are using desktop PCs at my institute. We have tested DRBD and Heartbeat, Now we want to use RHCS. So how do we configure manual fencing. Suggestions are welcome, Thank you.. --------------------------------- You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. -------------- next part -------------- An HTML attachment was scrubbed... URL: From maciej.bogucki at artegence.com Thu Apr 3 08:01:51 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Thu, 03 Apr 2008 10:01:51 +0200 Subject: [Linux-cluster] writing to GFS from multiple JVM's concurrently In-Reply-To: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> References: <3abaa1ce0804020830k30bce4eey2127b8687f14d912@mail.gmail.com> Message-ID: <47F48EEF.4070609@artegence.com> Tajdar Siddiqui napisa?(a): > Hi, > > We are evaluating GFS for use as a highly concurrent distributed file > system. > > What I have observed: > > When 2 JVM's (multiple Threads per Java Virtual Machine) are writing to > the same directory on GFS, on of the JVM doesn't see the files it writes > on the GFS. > The Writer Threads on JVM think they're done, but the files don't show > up on "ls" etc. > The other JVM works fine. > > This problem goes away if the 2 JVM's write to different directories on GFS > > OR > > Only one JVM is writing at a time. > > Any ideas on this. > > http://sourceware.org/cluster/faq.html#gfs_samefile Best Regards Maciej Bogucki From maciej.bogucki at artegence.com Thu Apr 3 08:30:28 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Thu, 03 Apr 2008 10:30:28 +0200 Subject: [Linux-cluster] How to use manual fencing in Redhat Cluster Suit In-Reply-To: <468783.25520.qm@web45809.mail.sp1.yahoo.com> References: <468783.25520.qm@web45809.mail.sp1.yahoo.com> Message-ID: <47F495A4.6030400@artegence.com> Mshehzad Pankhawala napisa?(a): > We are using desktop PCs at my institute. > We have tested DRBD and Heartbeat, > Now we want to use RHCS. So how do we configure manual fencing. > > Suggestions are welcome, Best Regards Maciej Bogucki From nkhare.lists at gmail.com Thu Apr 3 08:52:37 2008 From: nkhare.lists at gmail.com (Neependra Khare) Date: Thu, 03 Apr 2008 14:22:37 +0530 Subject: [Linux-cluster] How to use manual fencing in Redhat Cluster Suit In-Reply-To: <47F495A4.6030400@artegence.com> References: <468783.25520.qm@web45809.mail.sp1.yahoo.com> <47F495A4.6030400@artegence.com> Message-ID: <47F49AD5.9080702@gmail.com> Maciej Bogucki wrote: > Mshehzad Pankhawala napisa?(a): > >> We are using desktop PCs at my institute. >> We have tested DRBD and Heartbeat, >> Now we want to use RHCS. So how do we configure manual fencing. >> >> Suggestions are welcome, >> > > > > > > > > > > > > > > > > > > > > To complete the fencing you have to run "fence_ack_manual" command manually. For more information have a look at man page of "fence_ack_manual" . Neependra From gordan at bobich.net Thu Apr 3 08:56:29 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 3 Apr 2008 09:56:29 +0100 (BST) Subject: [Linux-cluster] How to use manual fencing in Redhat Cluster Suit In-Reply-To: <468783.25520.qm@web45809.mail.sp1.yahoo.com> References: <468783.25520.qm@web45809.mail.sp1.yahoo.com> Message-ID: As far as I'm aware, there is no way to use manual fencing in an automated way. You'll have to manually acknowledge that the machine has been fenced. On Thu, 3 Apr 2008, Mshehzad Pankhawala wrote: > We are using desktop PCs at my institute. > We have tested? DRBD and Heartbeat, > Now we want to use RHCS. So how do we configure manual fencing. From denisb+gmane at gmail.com Thu Apr 3 09:39:30 2008 From: denisb+gmane at gmail.com (denis) Date: Thu, 03 Apr 2008 11:39:30 +0200 Subject: [Linux-cluster] Re: fence_manual missing /tmp/fence_manual.fifo In-Reply-To: <47F0FD50.3030500@artegence.com> References: <47EBCEE8.7090905@artegence.com> <47F0FD50.3030500@artegence.com> Message-ID: Maciej Bogucki wrote: > denis napisa?(a): >> Maciej Bogucki wrote: >>>> Are you certain you want to continue? [yN] y >>>> can't open /tmp/fence_manual.fifo: No such file or directory > Do You run fence_ack_manual on the node which is master in the cluster[1]? > [1] - http://www.mail-archive.com/linux-cluster at redhat.com/msg02173.html Thanks for the reply. Well, yes this is a two node cluster, so I am doing it on the remaining node, which will then be the master node. I will however double check the next time manual fencing hits in (which isn't often anymore as the bladecenter fencing works smoothly now. Regards -- Denis From ben.yarwood at juno.co.uk Thu Apr 3 11:16:23 2008 From: ben.yarwood at juno.co.uk (Ben Yarwood) Date: Thu, 3 Apr 2008 12:16:23 +0100 Subject: [Linux-cluster] gfs_fsck memory allocation Message-ID: <02f401c8957c$2b9d1fa0$82d75ee0$@yarwood@juno.co.uk> I'm trying to run gfs_fsck on a 16TB file system and I keep getting the following message Initializing fsck Unable to allocate bitmap of size 520093697 This system doesn't have enough memory + swap space to fsck this file system. Additional memory needed is approximately: 5952MB Please increase your swap space by that amount and run gfs_fsck again. I have increased the swap size to 16GB but I still keep getting the message. Does anyone have any suggestions? From npf-mlists at eurotux.com Thu Apr 3 14:25:25 2008 From: npf-mlists at eurotux.com (Nuno Fernandes) Date: Thu, 3 Apr 2008 15:25:25 +0100 Subject: [Linux-cluster] Problem in clvmd and iscsi-target Message-ID: <200804031525.25980.npf-mlists@eurotux.com> Hi, There is a race condition in iscsi-target and clvmd that does not allow me to export a volume by iscsi and use it localy in clvmd. I have two servers "black" and "gray". "Gray" has two drives hda (for filesystem) and hdb (that is going to be exported through iscsi to "black"). Then both machines are part of a 2 node cluster to use clvmd. root gray ~ # /etc/init.d/iscsi-target start Starting iSCSI target service: [ OK ] root gray ~ # /etc/init.d/clvmd start Starting clvmd: [ OK ] Activating VGs: No volume groups found [ OK ] root gray ~ # pvcreate /dev/hdb Can't open /dev/hdb exclusively. Mounted filesystem? root gray ~ # /etc/init.d/iscsi-target stop Stopping iSCSI target service: [ OK ] root gray ~ # pvcreate /dev/hdb Physical volume "/dev/hdb" successfully created root gray ~ # I cannot use clvmd and iscsi-target on the same machine? If i create a logical volume is is activated in "black" lvdisplay -C LV VG Attr LSize Origin Snap% Move Log Copy% teste2 teste -wi-a- 1.00G and in "gray" is disabled: root gray ~ # lvdisplay -C LV VG Attr LSize Origin Snap% Move Log Copy% teste2 teste -wi-d- 1.00G Any ideas? Thanks Nuno Fernandes From rpeterso at redhat.com Thu Apr 3 15:19:20 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Thu, 03 Apr 2008 10:19:20 -0500 Subject: [Linux-cluster] gfs_fsck memory allocation In-Reply-To: <02f401c8957c$2b9d1fa0$82d75ee0$@yarwood@juno.co.uk> References: <02f401c8957c$2b9d1fa0$82d75ee0$@yarwood@juno.co.uk> Message-ID: <1207235960.24927.72.camel@technetium.msp.redhat.com> On Thu, 2008-04-03 at 12:16 +0100, Ben Yarwood wrote: > I'm trying to run gfs_fsck on a 16TB file system and I keep getting the following message > > Initializing fsck > Unable to allocate bitmap of size 520093697 > This system doesn't have enough memory + swap space to fsck this file system. > Additional memory needed is approximately: 5952MB > Please increase your swap space by that amount and run gfs_fsck again. > > I have increased the swap size to 16GB but I still keep getting the message. Does anyone have any suggestions? Hi Ben, The gfs_fsck needs one byte per block in each bitmap. That message indicates that it tried to allocate a chunk of 520MB of memory and got an error on it. IIRC, the biggest RG size is 2GB, and would therefore require at most a chunk of 512K. (Assuming 4K blocks and assuming I did the math correctly, which I won't promise!) a 520MB chunk is big enough to hold an entire RG; much bigger than a bitmap for one. So this error is most likely caused by corruption in your system rindex file. Perhaps you should do gfs_tool rindex and look for anomalies. I'm planning to do some fixes to gfs_fsck to handle more cases like this, but that will take some time to resolve. If you send in your metadata (gfs2_tool savemeta), that might help me in this task. Regards, Bob Peterson Red Hat Clustering & GFS From theophanis_kontogiannis at yahoo.gr Thu Apr 3 08:44:56 2008 From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis) Date: Thu, 3 Apr 2008 11:44:56 +0300 Subject: [Linux-cluster] GFS2 error? Message-ID: <00e201c89567$0c88ef50$9601a8c0@corp.netone.gr> Hi all, Anybody knows what this is? GFS2: fsid=tweety:gfs0.0: fatal: invalid metadata block GFS2: fsid=tweety:gfs0.0: bh = 183677 (magic number) GFS2: fsid=tweety:gfs0.0: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 438 GFS2: fsid=tweety:gfs0.0: about to withdraw this file system GFS2: fsid=tweety:gfs0.0: telling LM to withdraw GFS2: fsid=tweety:gfs0.0: withdrawn I have the GFS2 fs on DRBD but DRBD reported no errors. Thank you all Theophanis Kontogiannis -------------- next part -------------- An HTML attachment was scrubbed... URL: From npf-mlists at eurotux.com Thu Apr 3 17:11:37 2008 From: npf-mlists at eurotux.com (Nuno Fernandes) Date: Thu, 3 Apr 2008 18:11:37 +0100 Subject: [Linux-cluster] Re: [Iscsitarget-devel] Problem in clvmd and iscsi-target In-Reply-To: References: Message-ID: <200804031811.38493.npf-mlists@eurotux.com> On Thursday 03 April 2008 15:56:52 Ross S. W. Walker wrote: > You have given lots of LVM info, but no iscsi-target info. > > What version? A copy of your ietd.conf. Output of /proc/net/iet/volume Sorry :) iscsitarget-0.4.16-1 ietd.conf Target iqn.2007-04.com.eurotux.dc.gray:storage.disk.hdb Lun 0 Path=/dev/hdb,Type=blockio,ScsiId=95afc14465efeb27 Alias Test InitialR2T No ImmediateData Yes MaxRecvDataSegmentLength 16384 MaxXmitDataSegmentLength 16384 #MaxBurstLength 262144 #FirstBurstLength 65536 MaxOutstandingR2T 8 Wthreads 8 Output of proc data: tid:1 name:iqn.2007-04.com.eurotux.dc.gray:storage.disk.hdb lun:0 state:0 iotype:blockio iomode:wt path:/dev/hdb Meanwhile i narrow it down to a iscsi-target problem. Doing strace i saw it requires exclusive open. So solve it it created an sort of an hack.. :) from hdb and using device-mapper i create a linear mapping to 2 devices: hdb-int hdb-ext Next i put hdb-ext in ietd.conf and hdb-int in lvm.conf. iscsi-target still opens in exclusive mode but it only opens hdb-ext device. clvmd uses hdb-int that has no exclusive lock. Another way that we rejected is to put "gray" machine exporting the iscsi volume to it self also and using that device in clvmd. This option also worked but has less performance as all local contend has to be encoded to iscsi and decoded in the same machine. The problem of iscsi-target opening the device excluse remais. Thanks, Nuno Fernandes > > If you export /dev/hdb via iscsi it will not be accessible for anything > else. > > -Ross > > > ----- Original Message ----- > From: iscsitarget-devel-bounces at lists.sourceforge.net > To: 'linux clustering' > ; iscsitarget-devel at lists.sourceforge.net > Sent: Thu Apr 03 10:25:25 2008 > Subject: [Iscsitarget-devel] Problem in clvmd and iscsi-target > > Hi, > > There is a race condition in iscsi-target and clvmd that does not allow me > to export a volume by iscsi and use it localy in clvmd. > > I have two servers "black" and "gray". "Gray" has two drives hda (for > filesystem) and hdb (that is going to be exported through iscsi to > "black"). Then both machines are part of a 2 node cluster to use clvmd. > > root gray ~ # /etc/init.d/iscsi-target start > Starting iSCSI target service: [ OK ] > root gray ~ # /etc/init.d/clvmd start > Starting clvmd: [ OK ] > Activating VGs: No volume groups found > [ OK ] > root gray ~ # pvcreate /dev/hdb > Can't open /dev/hdb exclusively. Mounted filesystem? > root gray ~ # /etc/init.d/iscsi-target stop > Stopping iSCSI target service: [ OK ] > root gray ~ # pvcreate /dev/hdb > Physical volume "/dev/hdb" successfully created > root gray ~ # > > I cannot use clvmd and iscsi-target on the same machine? If i create a > logical volume is is activated in "black" > > lvdisplay -C > LV VG Attr LSize Origin Snap% Move Log Copy% > teste2 teste -wi-a- 1.00G > > and in "gray" is disabled: > > root gray ~ # lvdisplay -C > LV VG Attr LSize Origin Snap% Move Log Copy% > teste2 teste -wi-d- 1.00G > > Any ideas? > > Thanks > Nuno Fernandes > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplac >e _______________________________________________ > Iscsitarget-devel mailing list > Iscsitarget-devel at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel > > ______________________________________________________________________ > This e-mail, and any attachments thereto, is intended only for use by > the addressee(s) named herein and may contain legally privileged > and/or confidential information. If you are not the intended recipient > of this e-mail, you are hereby notified that any dissemination, > distribution or copying of this e-mail, and any attachments thereto, > is strictly prohibited. If you have received this e-mail in error, > please immediately notify the sender and permanently delete the > original and any copy or printout thereof. From mcse47 at hotmail.com Thu Apr 3 18:43:00 2008 From: mcse47 at hotmail.com (Tracey Flanders) Date: Thu, 3 Apr 2008 14:43:00 -0400 Subject: [Linux-cluster] Is there a fencing agent I can use for iscsi ?(GFS and iSCSI) Message-ID: I have a 2 node cluster serverA and serverB using iSCSI for the shared disk. Both mount to ServerC which hosts the iscsi target. They both mount to the same iscsi target using GFS as the filesystem. I have setup RHCS and everything is working great except I do not have a proper fencing agent. Basically both nodes have to be online in order for the cluster to come up. I was wondering if anyone has written a iscsi fencing agent that I could use. I saw one written in perl that ssh'd into the node and added an iptables entry in order to fence the server from the iscsi target. It was from 2004 and didn't run correctly on my machine. Does anyone have any ideas? Or should I try and salvage the one I found and fix it up? Thanks. Tracey Flanders _________________________________________________________________ Get in touch in an instant. Get Windows Live Messenger now. http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_getintouch_042008 From lhh at redhat.com Thu Apr 3 19:19:52 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 03 Apr 2008 15:19:52 -0400 Subject: [Linux-cluster] VIP's on mixed subnets In-Reply-To: References: Message-ID: <1207250392.15132.46.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-04-01 at 11:38 -0600, Gary Romo wrote: > > In my cluster all of my servers NICs are bonded. > Up until recently all of my VIPs (for resources/services) were in the > same subnet. > Is it ok that VIPs be in mixed subnets? Thanks. Yes, as long as you have an IP on the subnet already. e.g. if you have eth1 on 192.168.1.0/24 and eth2 on 10.0.0.0/8, you can put IPs for those two subnets in cluster.conf. -- Lon From lhh at redhat.com Thu Apr 3 19:27:43 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 03 Apr 2008 15:27:43 -0400 Subject: [Linux-cluster] Using GFS and DLM without RHCS In-Reply-To: <47F16E7F.3000903@bobich.net> References: <47F113C5020000C800008B0C@mail-int.health-first.org> <47F16E7F.3000903@bobich.net> Message-ID: <1207250863.15132.51.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-04-01 at 00:06 +0100, Gordan Bobic wrote: > Danny Wall wrote: > > I was wondering if it is possible to run GFS on several machines with a > > shared GFS LUN, but not use full clustering like RHCS. From the FAQs: > > First of all, what's the problem with having RHCS running? It doesn't > mean you have to use it to handle resources failing over. You can run it > all in active/active setup with load balancing in front. Well, the direct answer is you don't need rgmanager (failover stuff) at all to run GFS. GFS is a client of the cluster infrastructure, and really, it's a peer of rgmanager (though you can use rgmanager to mount it if you want). You do, however, need to run the cluster infrastructure stack (openais/cman/dlm/fencing/etc.) to run GFS on multiple nodes if those nodes are accessing the same software. -- Lon From garromo at us.ibm.com Thu Apr 3 20:56:22 2008 From: garromo at us.ibm.com (Gary Romo) Date: Thu, 3 Apr 2008 14:56:22 -0600 Subject: [Linux-cluster] VIP's on mixed subnets In-Reply-To: <1207250392.15132.46.camel@ayanami.boston.devel.redhat.com> Message-ID: Thanks Lon. In my case I do not have another IP with another subnet. So, the only way to do this would be by obtaining another NIC with the other SUBNET on it, correct? e.g. My eth0 and eth1 are bonded with the same IP address, on the same subnet (192.168.0./24) I would need to order a new NIC (eth2), which would be on the other subnet (10.0.0.0/8), in order to use VIP from the 10.0 subnet? Correct? -Gary Lon Hohberger Sent by: linux-cluster-bounces at redhat.com 04/03/2008 01:19 PM Please respond to linux clustering To linux clustering cc Subject Re: [Linux-cluster] VIP's on mixed subnets On Tue, 2008-04-01 at 11:38 -0600, Gary Romo wrote: > > In my cluster all of my servers NICs are bonded. > Up until recently all of my VIPs (for resources/services) were in the > same subnet. > Is it ok that VIPs be in mixed subnets? Thanks. Yes, as long as you have an IP on the subnet already. e.g. if you have eth1 on 192.168.1.0/24 and eth2 on 10.0.0.0/8, you can put IPs for those two subnets in cluster.conf. -- Lon -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at eciad.ca Thu Apr 3 21:29:55 2008 From: david at eciad.ca (David Ayre) Date: Thu, 3 Apr 2008 14:29:55 -0700 Subject: [Linux-cluster] dlm high cpu on latest stock centos 5.1 kernel In-Reply-To: <47567.10.8.105.69.1207097461.squirrel@secure.ntsg.umt.edu> References: <33710.10.8.105.69.1207095555.squirrel@secure.ntsg.umt.edu> <47567.10.8.105.69.1207097461.squirrel@secure.ntsg.umt.edu> Message-ID: <7E877A62-DDED-46F8-AB00-10B396AA9791@eciad.ca> Some progress... We had another dlm_sendd lockup yesterday which prompted us to do some reworking of our file sharing. Previously we had both SMB and NFS services competing for GFS resources on this particular node. We thought perhaps it was this combination which may have provoked the lockups... so, we moved things around with the help of another server in our GFS cluster. Previously we had: Machine A (nfs and smb services sitting on top of gfs) NFS SMB GFS And switched things around to this: Machine A SMB NFS -> Machine B Machine B NFS GFS Basically we moved all NFS mounts to machine B.... NFS is the only file sharing service using GFS on this machine, and changed Machine A to use an NFS mount to machine B. This way we don't have any nodes with both SMB and NFS services running on top of GFS. Previously we had 1-2 lockups a day, but today nothing... so far so good. Not sure if this configuration will work for you... let me know if you need any further clarification. d On 1-Apr-08, at 5:51 PM, Andrew A. Neuschwander wrote: > My symptoms are similar. dlm_send sits on all of the cpu. Top shows > the > cpu spending nearly all of it's time in sys or interrupt handling. > Disk > and network I/O isn't very high (as seen via iostat and iptraf). But > SMB/NFS throughput and latency are horrible. Context switches per > second > as seen by vmstat are in the 20,000+ range (I don't now if this is > high > though, I haven't really paid attention to this in the past). Nothing > crashes, and it is still able to serve data (very slowly), and > eventually > the load and latency recovers. > > As an aside, does anyone know how to _view_ the resource group size > after > file system creation on GFS? > > Thanks, > -Andrew > > > On Tue, April 1, 2008 6:30 pm, David Ayre wrote: >> What do you mean by pounded exactly ? >> >> We have an ongoing issue, similar... when we have about a dozen users >> using both smb/nfs, and at some seemingly random point in time our >> dlm_senddd chews up 100% of the CPU... then dies down at on its own >> after quite a while. Killing SMB processes, shutting down SMB didn't >> seem to have any affect... only a reboot cures it. I've seen this >> described (if this is the same issue) as a "soft lockup" as it does >> seem to come back to life: >> >> http://lkml.org/lkml/2007/10/4/137 >> >> We've been assuming its a kernel/dlm version as we are running >> 2.6.9-55.0.6.ELsmp with dlm-kernel 2.6.9-46.16.0.8 >> >> we were going to try a kernel update this week... but you seem to be >> using a later version and still have this problem ? >> >> Could you elaborate on "getting pounded by dlm" ? I've posted about >> this on this list in the past but received no assistance. >> >> >> >> >> On 1-Apr-08, at 5:19 PM, Andrew A. Neuschwander wrote: >> >>> I have a GFS cluster with one node serving files via smb and nfs. >>> Under >>> fairly light usage (5-10 users) the cpu is getting pounded by dlm. I >>> am >>> using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This >>> sounds >>> like the dlm issue mentioned back in March of last year >>> (https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html >>> ) >>> that was resolved in 2.6.21. >>> >>> Has (or will) this fix be back ported to the current el5 kernel? >>> Will it >>> be in RHEL5.2? What is the easiest way for me to get this fix? >>> >>> Also, if I try a newer kernel on this node, will there be any harm >>> in the >>> other nodes using their current kernel? >>> >>> Thanks, >>> -Andrew >>> -- >>> Andrew A. Neuschwander, RHCE >>> Linux Systems Administrator >>> Numerical Terradynamic Simulation Group >>> College of Forestry and Conservation >>> The University of Montana >>> http://www.ntsg.umt.edu >>> andrew at ntsg.umt.edu - 406.243.6310 >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~ >> David Ayre >> Programmer/Analyst - Information Technlogy Services >> Emily Carr Institute of Art and Design >> Vancouver, B.C. Canada >> 604-844-3875 / david at eciad.ca >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~ David Ayre Programmer/Analyst - Information Technlogy Services Emily Carr Institute of Art and Design Vancouver, B.C. Canada 604-844-3875 / david at eciad.ca From andrew at ntsg.umt.edu Thu Apr 3 22:35:41 2008 From: andrew at ntsg.umt.edu (Andrew A. Neuschwander) Date: Thu, 3 Apr 2008 16:35:41 -0600 (MDT) Subject: [Linux-cluster] dlm high cpu on latest stock centos 5.1 kernel In-Reply-To: <7E877A62-DDED-46F8-AB00-10B396AA9791@eciad.ca> References: <33710.10.8.105.69.1207095555.squirrel@secure.ntsg.umt.edu> <47567.10.8.105.69.1207097461.squirrel@secure.ntsg.umt.edu> <7E877A62-DDED-46F8-AB00-10B396AA9791@eciad.ca> Message-ID: <36955.10.8.105.69.1207262141.squirrel@secure.ntsg.umt.edu> Dave, Thanks for the update. I had considered that and I'm setup to be able to do it. Now that someone else has tried with positive results, I think I'll give it a try. Thanks, -Andrew On Thu, April 3, 2008 3:29 pm, David Ayre wrote: > Some progress... > > We had another dlm_sendd lockup yesterday which prompted us to do some > reworking of our file sharing. Previously we had both SMB and NFS > services competing for GFS resources on this particular node. We > thought perhaps it was this combination which may have provoked the > lockups... so, we moved things around with the help of another server > in our GFS cluster. > > Previously we had: > > Machine A (nfs and smb services sitting on top of gfs) > NFS SMB > GFS > > And switched things around to this: > > Machine A > SMB > NFS -> Machine B > > Machine B > NFS > GFS > > Basically we moved all NFS mounts to machine B.... NFS is the only > file sharing service using GFS on this machine, and changed Machine A > to use an NFS mount to machine B. This way we don't have any nodes > with both SMB and NFS services running on top of GFS. > > Previously we had 1-2 lockups a day, but today nothing... so far so > good. Not sure if this configuration will work for you... let me > know if you need any further clarification. > > d > > > On 1-Apr-08, at 5:51 PM, Andrew A. Neuschwander wrote: > >> My symptoms are similar. dlm_send sits on all of the cpu. Top shows >> the >> cpu spending nearly all of it's time in sys or interrupt handling. >> Disk >> and network I/O isn't very high (as seen via iostat and iptraf). But >> SMB/NFS throughput and latency are horrible. Context switches per >> second >> as seen by vmstat are in the 20,000+ range (I don't now if this is >> high >> though, I haven't really paid attention to this in the past). Nothing >> crashes, and it is still able to serve data (very slowly), and >> eventually >> the load and latency recovers. >> >> As an aside, does anyone know how to _view_ the resource group size >> after >> file system creation on GFS? >> >> Thanks, >> -Andrew >> >> >> On Tue, April 1, 2008 6:30 pm, David Ayre wrote: >>> What do you mean by pounded exactly ? >>> >>> We have an ongoing issue, similar... when we have about a dozen users >>> using both smb/nfs, and at some seemingly random point in time our >>> dlm_senddd chews up 100% of the CPU... then dies down at on its own >>> after quite a while. Killing SMB processes, shutting down SMB didn't >>> seem to have any affect... only a reboot cures it. I've seen this >>> described (if this is the same issue) as a "soft lockup" as it does >>> seem to come back to life: >>> >>> http://lkml.org/lkml/2007/10/4/137 >>> >>> We've been assuming its a kernel/dlm version as we are running >>> 2.6.9-55.0.6.ELsmp with dlm-kernel 2.6.9-46.16.0.8 >>> >>> we were going to try a kernel update this week... but you seem to be >>> using a later version and still have this problem ? >>> >>> Could you elaborate on "getting pounded by dlm" ? I've posted about >>> this on this list in the past but received no assistance. >>> >>> >>> >>> >>> On 1-Apr-08, at 5:19 PM, Andrew A. Neuschwander wrote: >>> >>>> I have a GFS cluster with one node serving files via smb and nfs. >>>> Under >>>> fairly light usage (5-10 users) the cpu is getting pounded by dlm. I >>>> am >>>> using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This >>>> sounds >>>> like the dlm issue mentioned back in March of last year >>>> (https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html >>>> ) >>>> that was resolved in 2.6.21. >>>> >>>> Has (or will) this fix be back ported to the current el5 kernel? >>>> Will it >>>> be in RHEL5.2? What is the easiest way for me to get this fix? >>>> >>>> Also, if I try a newer kernel on this node, will there be any harm >>>> in the >>>> other nodes using their current kernel? >>>> >>>> Thanks, >>>> -Andrew >>>> -- >>>> Andrew A. Neuschwander, RHCE >>>> Linux Systems Administrator >>>> Numerical Terradynamic Simulation Group >>>> College of Forestry and Conservation >>>> The University of Montana >>>> http://www.ntsg.umt.edu >>>> andrew at ntsg.umt.edu - 406.243.6310 >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~ >>> David Ayre >>> Programmer/Analyst - Information Technlogy Services >>> Emily Carr Institute of Art and Design >>> Vancouver, B.C. Canada >>> 604-844-3875 / david at eciad.ca >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~ > David Ayre > Programmer/Analyst - Information Technlogy Services > Emily Carr Institute of Art and Design > Vancouver, B.C. Canada > 604-844-3875 / david at eciad.ca > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- Andrew A. Neuschwander, RHCE Linux Systems Administrator Numerical Terradynamic Simulation Group College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew at ntsg.umt.edu - 406.243.6310 From andrew at ntsg.umt.edu Thu Apr 3 23:06:22 2008 From: andrew at ntsg.umt.edu (Andrew A. Neuschwander) Date: Thu, 3 Apr 2008 17:06:22 -0600 (MDT) Subject: [Linux-cluster] gfs mount options and tuneables Message-ID: <42586.10.8.105.69.1207263982.squirrel@secure.ntsg.umt.edu> How important is it that all members of a gfs cluster have the same mount options and tunable values set? Is having different options safe in general or is the safety dependent on the specific option in question? Thanks, -Andrew -- Andrew A. Neuschwander, RHCE Linux Systems Administrator Numerical Terradynamic Simulation Group College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew at ntsg.umt.edu - 406.243.6310 From theophanis_kontogiannis at yahoo.gr Fri Apr 4 00:10:52 2008 From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis) Date: Fri, 4 Apr 2008 03:10:52 +0300 Subject: [Linux-cluster] fsck.gfs2 seg fault Message-ID: <000501c895e8$5f13e5f0$9601a8c0@corp.netone.gr> Hello all I have RHEL 5.1 with 2.6.18 kernel and fsck.gfs2 (GFS2 fsck 0.1.38) always seg faults at 99% with: fsck.gfs2[8245]: segfault at 0000000000000018 rip 00000000004047db rsp 00007ffffbabecb0 error 4 Any ideas on that? Thank you all for your time Theophanis Kontogiannis -------------- next part -------------- An HTML attachment was scrubbed... URL: From theophanis_kontogiannis at yahoo.gr Thu Apr 3 23:15:54 2008 From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis) Date: Fri, 4 Apr 2008 02:15:54 +0300 Subject: [Linux-cluster] Problem with 2 node cluster and GFS2 Message-ID: <000001c895e0$ac2aacf0$9601a8c0@corp.netone.gr> Hello all, Any ideas on what the following is? GFS2: fsid=: Trying to join cluster "lock_dlm", "tweety:gfs0" GFS2: fsid=tweety:gfs0.0: Joined cluster. Now mounting FS... GFS2: fsid=tweety:gfs0.0: jid=0, already locked for use GFS2: fsid=tweety:gfs0.0: jid=0: Looking at journal... GFS2: fsid=tweety:gfs0.0: jid=0: Acquiring the transaction lock... GFS2: fsid=tweety:gfs0.0: jid=0: Replaying journal... GFS2: fsid=tweety:gfs0.0: jid=0: Replayed 4 of 4 blocks GFS2: fsid=tweety:gfs0.0: jid=0: Found 0 revoke tags GFS2: fsid=tweety:gfs0.0: jid=0: Journal replayed in 1s GFS2: fsid=tweety:gfs0.0: jid=0: Done GFS2: fsid=tweety:gfs0.0: jid=1: Trying to acquire journal lock... GFS2: fsid=tweety:gfs0.0: jid=1: Looking at journal... GFS2: fsid=tweety:gfs0.0: jid=1: Done GFS2: fsid=tweety:gfs0.0: jid=2: Trying to acquire journal lock... GFS2: fsid=tweety:gfs0.0: jid=2: Looking at journal... GFS2: fsid=tweety:gfs0.0: jid=2: Done GFS2: fsid=tweety:gfs0.0: jid=3: Trying to acquire journal lock... GFS2: fsid=tweety:gfs0.0: jid=3: Looking at journal... GFS2: fsid=tweety:gfs0.0: jid=3: Done GFS2: fsid=tweety:gfs0.0: fatal: invalid metadata block GFS2: fsid=tweety:gfs0.0: bh = 162602 (magic number) GFS2: fsid=tweety:gfs0.0: function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 438 GFS2: fsid=tweety:gfs0.0: about to withdraw this file system GFS2: fsid=tweety:gfs0.0: telling LM to withdraw GFS2: fsid=tweety:gfs0.0: withdrawn Call Trace: [] :gfs2:gfs2_lm_withdraw+0xc1/0xd0 [] sync_buffer+0x0/0x3f [] out_of_line_wait_on_bit+0x6c/0x78 [] wake_bit_function+0x0/0x23 [] :gfs2:gfs2_meta_check_ii+0x2c/0x38 [] :gfs2:gfs2_meta_indirect_buffer+0x1e3/0x284 [] :gfs2:gfs2_inode_refresh+0x22/0x2b9 [] :gfs2:inode_go_lock+0x29/0x57 [] :gfs2:glock_wait_internal+0x1e3/0x259 [] :gfs2:gfs2_glock_nq+0x1ae/0x1d4 [] :gfs2:gfs2_getattr+0x7d/0xc3 [] :gfs2:gfs2_getattr+0x75/0xc3 [] vfs_getattr+0x2d/0xa9 [] vfs_lstat_fd+0x2f/0x47 [] sys_newlstat+0x19/0x31 [] tracesys+0x71/0xe0 [] tracesys+0xd5/0xe0 I see it on both nodes when I try to access a particular folder from either node Thank you all Theophanis Kontogiannis -------------- next part -------------- An HTML attachment was scrubbed... URL: From Christopher.Barry at qlogic.com Fri Apr 4 03:23:27 2008 From: Christopher.Barry at qlogic.com (christopher barry) Date: Thu, 03 Apr 2008 23:23:27 -0400 Subject: [Linux-cluster] Unformatting a GFS cluster disk In-Reply-To: <47EFEFE9.9040903@gmail.com> References: <47EA6EB7.1090108@gmail.com> <1206545696.5336.48.camel@localhost> <20080326205842.GA22083@nlxdcldnl2.cl.intel.com> <1206646016.29968.60.camel@localhost> <20080328144225.GA12231@nlxdcldnl2.cl.intel.com> <1206733914.5433.39.camel@localhost> <47EFEFE9.9040903@gmail.com> Message-ID: <1207279407.6233.6.camel@localhost> On Sun, 2008-03-30 at 14:54 -0500, Wendy Cheng wrote: snip... > In general, GFS backup from Linux side during run time has been a pain, > mostly because of its slowness and the process has to walk thru the > whole filesystem to read every single file that ends up accumulating > non-trivial amount of cached glocks and memory. For a sizable filesystem > (say in TBs range like yours), past experiences have shown that after > backup(s), the filesystem latency can go up to an unacceptable level > unless its glocks are trimmed. There is a tunable specifically written > for this purpose (glock_purge - introduced via RHEL 4.5 ) though. What should I be setting glock_purge to? snip... > The thinking here is to leverage the embedded Netapp copy-on-write > feature to speed up the backup process with reasonable disk space > requirement. The snapshot volume and the cloned lun shouldn't take much > disk space and we can turn on gfs readahead and glock_purge tunables > with minimum interruptions to the original gfs volume. The caveat here > is GFS-mounting the cloned lun - for one, gfs itself at this moment > doesn't allow mounting of multiple devices that have the same filesystem > identifiers (the -t value you use during mkfs time e.g. > "cluster-name:filesystem-name") on the same node - but it can be fixed > (by rewriting the filesystem ID and lock protocol - I will start to test > out the described backup script and a gfs kernel patch next week). Also > as any tape backup from linux host, you should not expect an image of > gfs mountable device (when retrieving from tape) - it is basically a > collection of all files residing on the gfs filesystem when the backup > events take places. > > Will the above serve your need ? Maybe other folks have (other) better > ideas ? This sounds exactly like what I can use - and it's got to be useful for everyone with a NetApp and gfs. Thanks for doing this! Let me know how I can help. Regards, -C From Alain.Moulle at bull.net Fri Apr 4 06:32:36 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Fri, 04 Apr 2008 08:32:36 +0200 Subject: [Linux-cluster] CS5/ best way to monitor other eth networks Message-ID: <47F5CB84.3070901@bull.net> Hi Which is the best way to monitor other eth networks than the one for heart-beat ? I don't think it's possible with heuristics if we have already set two ping on the same network to check and avoid dual-fencing. would it be to create a service with a status target which will ping the eth network to be monitored (quite same way as both pings in heuristics for heart-beat network) ? or would it be a cron to check periodically the network and to do poweroff -f in case of failure ? Thanks for piece of advice. Regards Alain Moull? From maciej.bogucki at artegence.com Fri Apr 4 07:50:20 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Fri, 04 Apr 2008 09:50:20 +0200 Subject: [Linux-cluster] Problem in clvmd and iscsi-target In-Reply-To: <200804031525.25980.npf-mlists@eurotux.com> References: <200804031525.25980.npf-mlists@eurotux.com> Message-ID: <47F5DDBC.1040307@artegence.com> Nuno Fernandes napisa?(a): > Hi, > > There is a race condition in iscsi-target and clvmd that does not allow me to > export a volume by iscsi and use it localy in clvmd. > > I have two servers "black" and "gray". "Gray" has two drives hda (for > filesystem) and hdb (that is going to be exported through iscsi to "black"). > Then both machines are part of a 2 node cluster to use clvmd. > > root gray ~ # /etc/init.d/iscsi-target start > Starting iSCSI target service: [ OK ] > root gray ~ # /etc/init.d/clvmd start > Starting clvmd: [ OK ] > Activating VGs: No volume groups found > [ OK ] > root gray ~ # pvcreate /dev/hdb > Can't open /dev/hdb exclusively. Mounted filesystem? > root gray ~ # /etc/init.d/iscsi-target stop > Stopping iSCSI target service: [ OK ] > root gray ~ # pvcreate /dev/hdb > Physical volume "/dev/hdb" successfully created > root gray ~ # > > I cannot use clvmd and iscsi-target on the same machine? If i create a logical > volume is is activated in "black" > > lvdisplay -C > LV VG Attr LSize Origin Snap% Move Log Copy% > teste2 teste -wi-a- 1.00G > > and in "gray" is disabled: > > root gray ~ # lvdisplay -C > LV VG Attr LSize Origin Snap% Move Log Copy% > teste2 teste -wi-d- 1.00G > > Any ideas? Hello, This is strange to me. I have naver done it but I'm sure it is possible to do! There is nothing special in /etc/init.d/iscsi-target what could wrong. Could You send Your /etc/ietd.conf. I think that You have exported /dev/hdb via iscsi, ant here is Your problem. Best Regards Maciej Bogucki From maciej.bogucki at artegence.com Fri Apr 4 08:02:40 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Fri, 04 Apr 2008 10:02:40 +0200 Subject: [Linux-cluster] Is there a fencing agent I can use for iscsi ?(GFS and iSCSI) In-Reply-To: References: Message-ID: <47F5E0A0.9080709@artegence.com> > Or should I try and salvage the one I found and fix it up? Thanks. Hello, I think that fixing this script is the best way to do it with linux iscsi-target. Best Regards Maciej Bogucki From pmshehzad at yahoo.com Fri Apr 4 08:28:32 2008 From: pmshehzad at yahoo.com (Mshehzad Pankhawala) Date: Fri, 4 Apr 2008 01:28:32 -0700 (PDT) Subject: [Linux-cluster] How to use manual fencing in Redhat Cluster Suit Message-ID: <553769.13176.qm@web45812.mail.sp1.yahoo.com> Thanks for your kind reply, I have successfully configured two node cluster using system-config-cluster with given settings. Now trying to use drbd with RHCS. Regards MohammedShehzad --------------------------------- You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. --------------------------------- You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. -------------- next part -------------- An HTML attachment was scrubbed... URL: From johannes.russek at io-consulting.net Fri Apr 4 08:41:00 2008 From: johannes.russek at io-consulting.net (jr) Date: Fri, 04 Apr 2008 10:41:00 +0200 Subject: [Linux-cluster] Is there a fencing agent I can use for iscsi ?(GFS and iSCSI) In-Reply-To: References: Message-ID: <1207298460.15409.39.camel@admc.win-rar.local> > I was wondering if anyone has written a iscsi fencing agent that I could use. I saw one written in perl that ssh'd into the node and added an iptables entry in order to fence the server from the iscsi target. It was from 2004 and didn't run correctly on my machine. Does anyone have any ideas? Or should I try and salvage the one I found and fix it up? Thanks. if you need to use it (as suggested in that other reply), i'd make sure it doesn't connect to a node but to the iSCSI target and adds the firewall rules there :) or even better if you have a managed switch in between where you can simply disable the ethernet port (or even better, have iSCSI on a separate vlan and remove the port from that vlan) via an ssh script or maybe snmp or whatever. enjoy, johannes > Tracey Flanders > > _________________________________________________________________ > Get in touch in an instant. Get Windows Live Messenger now. > http://www.windowslive.com/messenger/overview.html?ocid=TXT_TAGLM_WL_Refresh_getintouch_042008 > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From npf-mlists at eurotux.com Fri Apr 4 09:06:35 2008 From: npf-mlists at eurotux.com (Nuno Fernandes) Date: Fri, 4 Apr 2008 10:06:35 +0100 Subject: [Linux-cluster] Re: [Iscsitarget-devel] Problem in clvmd and iscsi-target In-Reply-To: References: <200804031811.38493.npf-mlists@eurotux.com> Message-ID: <200804041006.36217.npf-mlists@eurotux.com> > > Meanwhile i narrow it down to a iscsi-target problem. Doing strace i saw > > it requires exclusive open. So solve it it created an sort of an > > hack.. :) > > Ok, you cannot have local services and remote services accessing > the same raw data sectors at the same time. > > I expressedly made blockio open the device exclusively to avoid > just what you are trying to do. Why? Did you have any problems in this scenario? Using blockio iscsi-target does not use any cache, right? > If you really need access to the iSCSI target locally, then > you will need to install an iSCSI initiator on the server and > connect to the target and use the new disk volume that is > created. Ok.. and is the following scenario workable? HP package cluster with MSA500 (basically 2 servers connected to a scsi shared storage) both servers export the same msa500 volume on clients i use iscsi-initiator + multipath to support the failure of one server In this scenario there are two iscsi-target server accessing the same data but on different machines. Basically it's the same thing as another process accessing the exported volume... Do you agree? > > from hdb and using device-mapper i create a linear mapping to 2 devices: > > > > hdb-int > > hdb-ext > > > > Next i put hdb-ext in ietd.conf and hdb-int in lvm.conf. iscsi-target > > still opens in exclusive mode but it only opens hdb-ext device. > > clvmd uses hdb-int that has no exclusive lock. > > You are heading down a path here that is fully unsupported and > discouraged, if you loose your data it will be completely and > totally because of this. :( > > Another way that we rejected is to put "gray" machine exporting the iscsi > > volume to it self also and using that device in clvmd. This option also > > worked but has less performance as all local contend has to be encoded to > > iscsi and decoded in the same machine. > > iSCSI encoding/decoding is minimal and if you are using a loopback > to connect should be as fast as the machine can do it. Do not try > this with fileio unit types though or the page cache will deadlock > between iSCSI target/initiator. IMHO encoding and decoding it locally would be a performance penalty that would not be required.. I dind't used fileio exactly because it caches content. > > The problem of iscsi-target opening the device excluse remais. > > And it will continue to remain. > > Why not try to explain what it is your are trying to accomplish > and a valid solution or two can be given. What i'm trying to do is the following: 2 servers connected to msa500 exporting a volume through iscsi-target 6 servers with iscsi-initiator + multipath accessing the volume all 8 servers with clvmd accessing the volume so i can create LVs in any of the 8 servers. The LVs are to be used by Xen virtual machines in any machine in the cluster I didn't want to use iscsi-target + iscsi-initiator on the msa connected machines as it would be a performance penalty. Thanks, Nuno Fernandes From maciej.bogucki at artegence.com Fri Apr 4 10:28:53 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Fri, 04 Apr 2008 12:28:53 +0200 Subject: [Linux-cluster] Is there a fencing agent I can use for iscsi ?(GFS and iSCSI) In-Reply-To: <1207298460.15409.39.camel@admc.win-rar.local> References: <1207298460.15409.39.camel@admc.win-rar.local> Message-ID: <47F602E5.9040306@artegence.com> jr napisa?(a): >> I was wondering if anyone has written a iscsi fencing agent that I could use. I saw one written in perl that ssh'd into the node and added an iptables entry in order to fence the server from the iscsi target. It was from 2004 and didn't run correctly on my machine. Does anyone have any ideas? Or should I try and salvage the one I found and fix it up? Thanks. > > if you need to use it (as suggested in that other reply), i'd make sure > it doesn't connect to a node but to the iSCSI target and adds the > firewall rules there :) or even better if you have a managed switch in > between where you can simply disable the ethernet port (or even better, > have iSCSI on a separate vlan and remove the port from that vlan) via an > ssh script or maybe snmp or whatever. > enjoy, Another option is fencing via power device fe. fence_apc, fence_apc_snmp but You would need tu but APC hardware. Fenceing via fence_ilo, fence_rsa. fence_ipmilan is the option if You would have IBM, Dell or HP servers. You could also try fence_scsi without any costs, but it doesn't works if You had multipath configuration. Best Regards Maciej Bogucki From rpeterso at redhat.com Fri Apr 4 13:37:55 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Fri, 04 Apr 2008 08:37:55 -0500 Subject: [Linux-cluster] fsck.gfs2 seg fault In-Reply-To: <000501c895e8$5f13e5f0$9601a8c0@corp.netone.gr> References: <000501c895e8$5f13e5f0$9601a8c0@corp.netone.gr> Message-ID: <1207316275.2740.10.camel@technetium.msp.redhat.com> On Fri, 2008-04-04 at 03:10 +0300, Theophanis Kontogiannis wrote: > Hello all > > I have RHEL 5.1 with 2.6.18 kernel and fsck.gfs2 (GFS2 fsck 0.1.38) > always seg faults at 99% with: > > fsck.gfs2[8245]: segfault at 0000000000000018 rip 00000000004047db rsp > 00007ffffbabecb0 error 4 > > Any ideas on that? > > > Thank you all for your time > > Theophanis Kontogiannis Hi Theophanis, That's not enough information to tell what's going on. It would probably be more helpful if I could get the full call trace, and the last several lines of output from fsck.gfs2 -vv . It would even be better if you built gfs2.fsck from the latest source tree, changed the Makefile to use "-g" rather than "-O2", compiled it, and then ran it from gdb. Then if/when it segfaults, you can do "bt" to get a better backtrace. Also, be aware that GFS2 is not ready for production in RHEL5.1. Regards, Bob Peterson Red Hat Clustering & GFS From johannes.russek at io-consulting.net Fri Apr 4 14:15:43 2008 From: johannes.russek at io-consulting.net (jr) Date: Fri, 04 Apr 2008 16:15:43 +0200 Subject: [Linux-cluster] Nagios check Message-ID: <1207318543.15409.42.camel@admc.win-rar.local> Hi Everybody, i wonder if i'm the first with the need to check the status of GFS / cman with nagios. Did anyone maybe already write a check script i did not find yet? i found one via google, but it basically just did an ls -l on the GFS share, and that seems to be a little bit too less for monitoring.. thanks in advance, regards, johannes From rotsen at gmail.com Fri Apr 4 14:41:43 2008 From: rotsen at gmail.com (=?ISO-8859-1?Q?N=E9stor?=) Date: Fri, 4 Apr 2008 07:41:43 -0700 Subject: [Linux-cluster] How to use manual fencing in Redhat Cluster Suit In-Reply-To: <553769.13176.qm@web45812.mail.sp1.yahoo.com> References: <553769.13176.qm@web45812.mail.sp1.yahoo.com> Message-ID: Mohammed, I need to do DRBD on RHEL5. Let me know wha tyou find out while doing your project I am sure I can use your experiences when I start my DRBD. Good Luck, N?stor :-) 2008/4/4 Mshehzad Pankhawala : > Thanks for your kind reply, > I have successfully configured two node cluster using > system-config-cluster with given settings. > Now trying to use drbd with RHCS. > Regards > MohammedShehzad > > ------------------------------ > You rock. That's why Blockbuster's offering you one month of Blockbuster > Total Access, > No Cost. > > ------------------------------ > You rock. That's why Blockbuster's offering you one month of Blockbuster > Total Access, > No Cost. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tiagocruz at forumgdh.net Fri Apr 4 14:46:25 2008 From: tiagocruz at forumgdh.net (Tiago Cruz) Date: Fri, 04 Apr 2008 11:46:25 -0300 Subject: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? Message-ID: <1207320385.19045.19.camel@tuxkiller.ig.com.br> If I'm using one LVM device, export it using GNDB, and format using GFS (1 or 2).... I will can add or remove some GB on this filesystem? Thanks -- Tiago Cruz http://everlinux.com Linux User #282636 From npf-mlists at eurotux.com Fri Apr 4 15:04:11 2008 From: npf-mlists at eurotux.com (Nuno Fernandes) Date: Fri, 4 Apr 2008 16:04:11 +0100 Subject: [Linux-cluster] Re: [Iscsitarget-devel] Problem in clvmd and iscsi-target In-Reply-To: References: <200804041006.36217.npf-mlists@eurotux.com> Message-ID: <200804041604.11396.npf-mlists@eurotux.com> Hello, First of all i would like to thank your patience... > There is a lot of confusion by newcomers to iSCSI storage. > > A lot of the time they think of iSCSI as yet another > file sharing method, which it isn't, it is a disk sharing > method, and if you allow 2 hosts to access the same disk > without putting special controls in place to make sure > that either 1) only 1 host at a time can access a given > disk, or 2) install a clustering file system that allows > multiple hosts to access the same disk at the same time, > then they will experience data corruption as there is > nothing preventing any two hosts from writing data on > top of each other. I understant.. iscsi has nothing to do with files or filesystems. Iscsi (and scsi for that matter) only work with blocks. If you try to put several machines accessing the same filesystem that is not cluster-aware you'll have lots of corruptions.. > The performance penalty you speak of with blockio being accessed > through a local iSCSI connection should really not be noticed > except for extreme high-end processing, which if that is the > case you are picking the wrong technology. We have bladecenter with FC storage for that :) What we are trying to do is remove "unecessary" load in the msa connected machines as they will be used for virtual machines also. > When you mount an iscsi target locally the open-iscsi initiator > does agressive caching of io, then the file system of the OS > does agressive caching itself, so it's not as if all io becomes > synchronous in this scenario. You are correct but that also happens with 2 open-iscsi initiators accessing the same exported volume in different machines. The only difference is that instead of the msa500 volume being exported directly by iscsi-target there is a middleware (device-mapper) between msa500 volume and the iscsi-target. Device-mapper does not do cache. When we do an fsync in a guest machine it goes: virtual machine fsync -> clvmd/lvm -> iscsi-initiator -> iscsi-target -> device-mapper -> msa500 when the virtual machine is running in the msa500 connected hardware we get virtual machine fsync -> clvmd/lvm -> device-mapper (linear) -> msa500 > Now you can use clvm between the iSCSI targets to manage > how the MSA500 storage is allocated for the creation of > iSCSI targets, but once exported by iSCSI, these servers > should not care about what the initiators put into it > or how they manage it. That would require us to be changing all the time the iscsi-target and initiators confs as well as iscsi discovers and multipath in all the iscsi-initiators machines. When we create a volume to a virtual machine we would have to do: 1 - create volume in clvmd that manages the storage 2 - change ietd.conf to allow it to be exported 3 - discover the new device in initiators 4 - change multipath in initiators including the new volume Drawbacks: 1 - lots of changes in conf files, restarting services :) 2 - Multipath has a patchchecker that checks if a path is alive (usually readblock0). That would give me lots and lots of readblock0.. total checks in msa500 = num client machines * num multipath devices * num iscsi-target machines With 8 machines and 40 volumes we would have: 8 * 40 * 2 = 640 IO checks > > +--------+ <-> |- initiator1 > > | iSCSI1 | | > > +--------+ <-> +--------+ <-> |- initiator2 > > | MSA500 | (2) (3) (4) | (5) > > +--------+ <-> +--------+ <-> |- initiator3 > (1) | iSCSI2 | | > +--------+ <-> |- initiator4 > > 1) MSA500 provides volume1, volume2 to > fiber hosts iSCSI1/iSCSI2 > 2) iSCSI1/iSCSI2 fiber connect to MSA500 > 3) iSCSI1/iSCSI2 use clvm to divvy up > volume1 and volume2 into target1, target2 > target3, target4, target5 to iSCSI network > 4) iSCSI1/iSCSI2 provide targets to iSCSI > network through bonded pairs > 5) initiators use clvm to divvy up target1, > target2, target3... storage for use by Xen > domains. > I hope that helps. We are doing stress tests (bonnie++, ctcs) with our "hack" and so far it never had any problems. We even shutdown one of the iscsi-target nodes there's a small hiccup (as one path failed) but it continues shortly after. We've changed node.session.timeo.replacement_timeout node.conn[0].timeo.noop_out_interval node.conn[0].timeo.noop_out_timeout to increase the speed of the failover.. Thanks again, Nuno Fernandes From RJM002 at shsu.edu Fri Apr 4 15:26:46 2008 From: RJM002 at shsu.edu (MARTI, ROBERT JESSE) Date: Fri, 4 Apr 2008 10:26:46 -0500 Subject: [Linux-cluster] Nagios check References: <1207318543.15409.42.camel@admc.win-rar.local> Message-ID: <9F633DE6C0E04F4691DCB713AC44C94B066B56C7@EXCHANGE.SHSU.EDU> IIRC, theres a cluster snmp package - I would see what I can pull from there. Rob Marti Sam Houston State University Systems Analyst II 936-294-3804 // rjm002 at shsu.edu -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of jr Sent: Fri 4/4/2008 09:15 To: linux clustering Subject: [Linux-cluster] Nagios check Hi Everybody, i wonder if i'm the first with the need to check the status of GFS / cman with nagios. Did anyone maybe already write a check script i did not find yet? i found one via google, but it basically just did an ls -l on the GFS share, and that seems to be a little bit too less for monitoring.. thanks in advance, regards, johannes -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From garromo at us.ibm.com Fri Apr 4 15:50:26 2008 From: garromo at us.ibm.com (Gary Romo) Date: Fri, 4 Apr 2008 09:50:26 -0600 Subject: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? In-Reply-To: <1207320385.19045.19.camel@tuxkiller.ig.com.br> Message-ID: You cannot shrink/reduce the size of a GFS file system. Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com Tiago Cruz Sent by: linux-cluster-bounces at redhat.com 04/04/2008 08:46 AM Please respond to linux clustering To linux clustering cc Subject [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? If I'm using one LVM device, export it using GNDB, and format using GFS (1 or 2).... I will can add or remove some GB on this filesystem? Thanks -- Tiago Cruz http://everlinux.com Linux User #282636 -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From johannes.russek at io-consulting.net Fri Apr 4 16:22:52 2008 From: johannes.russek at io-consulting.net (jr) Date: Fri, 04 Apr 2008 18:22:52 +0200 Subject: [Linux-cluster] Nagios check In-Reply-To: <9F633DE6C0E04F4691DCB713AC44C94B066B56C7@EXCHANGE.SHSU.EDU> References: <1207318543.15409.42.camel@admc.win-rar.local> <9F633DE6C0E04F4691DCB713AC44C94B066B56C7@EXCHANGE.SHSU.EDU> Message-ID: <1207326172.15409.45.camel@admc.win-rar.local> good idea! if only i wouldn't run rhel5 x86_64 (centos in this case) which still maintains a bug of snmpd that causes it to lock up and stay in an infinite loop with 99% cpu usage :/ regards, johannes Am Freitag, den 04.04.2008, 10:26 -0500 schrieb MARTI, ROBERT JESSE: > IIRC, theres a cluster snmp package - I would see what I can pull from there. > > Rob Marti > Sam Houston State University > Systems Analyst II > 936-294-3804 // rjm002 at shsu.edu > > > > -----Original Message----- > From: linux-cluster-bounces at redhat.com on behalf of jr > Sent: Fri 4/4/2008 09:15 > To: linux clustering > Subject: [Linux-cluster] Nagios check > > Hi Everybody, > i wonder if i'm the first with the need to check the status of GFS / > cman with nagios. > Did anyone maybe already write a check script i did not find yet? > i found one via google, but it basically just did an ls -l on the GFS > share, and that seems to be a little bit too less for monitoring.. > thanks in advance, > regards, > johannes > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From alex.kompel at 23andme.com Fri Apr 4 17:29:09 2008 From: alex.kompel at 23andme.com (Alex Kompel) Date: Fri, 4 Apr 2008 10:29:09 -0700 Subject: [Linux-cluster] Is there a fencing agent I can use for iscsi ?(GFS and iSCSI) In-Reply-To: <47F602E5.9040306@artegence.com> References: <1207298460.15409.39.camel@admc.win-rar.local> <47F602E5.9040306@artegence.com> Message-ID: <68a019b50804041029n37f52dc8od34c597a23b928c3@mail.gmail.com> 2008/4/4 Maciej Bogucki : > jr napisa?(a): > >> I was wondering if anyone has written a iscsi fencing agent that I > could use. I saw one written in perl that ssh'd into the node and added an > iptables entry in order to fence the server from the iscsi target. It was > from 2004 and didn't run correctly on my machine. Does anyone have any > ideas? Or should I try and salvage the one I found and fix it up? Thanks. > > > > if you need to use it (as suggested in that other reply), i'd make sure > > it doesn't connect to a node but to the iSCSI target and adds the > > firewall rules there :) or even better if you have a managed switch in > > between where you can simply disable the ethernet port (or even better, > > have iSCSI on a separate vlan and remove the port from that vlan) via an > > ssh script or maybe snmp or whatever. > > enjoy, > > Another option is fencing via power device fe. fence_apc, fence_apc_snmp > but You would need tu but APC hardware. Fenceing via fence_ilo, > fence_rsa. fence_ipmilan is the option if You would have IBM, Dell or HP > servers. You could also try fence_scsi without any costs, but it doesn't > works if You had multipath configuration. > I second that: fence_scsi should work pretty well if your target supports SCSI-3 persistent reservations. It does not make much sense to use multipath I/O for iSCSI since channel bonding provides the same functionality nowadays. Also, if you have 2-node cluster then you can configure quorum disk on iSCSI volume as a tiebreaker . -Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgarrity at qualcomm.com Fri Apr 4 19:03:28 2008 From: jgarrity at qualcomm.com (John Garrity) Date: Fri, 04 Apr 2008 12:03:28 -0700 Subject: [Linux-cluster] iptables rules for LVS-DR cluster In-Reply-To: <68a019b50804041029n37f52dc8od34c597a23b928c3@mail.gmail.co m> References: <1207298460.15409.39.camel@admc.win-rar.local> <47F602E5.9040306@artegence.com> <68a019b50804041029n37f52dc8od34c597a23b928c3@mail.gmail.com> Message-ID: <200804041903.m34J3UQX017586@msgtransport04.qualcomm.com> I'm trying to get ftp working in a LVS DR cluster. I think it's the iptables rules that might be giving me a problem. I have http services working well. Can someone who has ftp working share their ip tables rules? I'm new at this so please go easy on me. Thanks! From sghosh at redhat.com Fri Apr 4 19:06:30 2008 From: sghosh at redhat.com (Subhendu Ghosh) Date: Fri, 04 Apr 2008 15:06:30 -0400 Subject: [Linux-cluster] Nagios check In-Reply-To: <1207326172.15409.45.camel@admc.win-rar.local> References: <1207318543.15409.42.camel@admc.win-rar.local> <9F633DE6C0E04F4691DCB713AC44C94B066B56C7@EXCHANGE.SHSU.EDU> <1207326172.15409.45.camel@admc.win-rar.local> Message-ID: <47F67C36.20502@redhat.com> If you would like this in the standard plugins distribution, let me know. There is a lot of back end work happening with the plugins. -regards Subhendu jr wrote: > good idea! > if only i wouldn't run rhel5 x86_64 (centos in this case) which still > maintains a bug of snmpd that causes it to lock up and stay in an > infinite loop with 99% cpu usage :/ > > regards, > johannes > > > Am Freitag, den 04.04.2008, 10:26 -0500 schrieb MARTI, ROBERT JESSE: >> IIRC, theres a cluster snmp package - I would see what I can pull from there. >> >> Rob Marti >> Sam Houston State University >> Systems Analyst II >> 936-294-3804 // rjm002 at shsu.edu >> >> >> >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com on behalf of jr >> Sent: Fri 4/4/2008 09:15 >> To: linux clustering >> Subject: [Linux-cluster] Nagios check >> >> Hi Everybody, >> i wonder if i'm the first with the need to check the status of GFS / >> cman with nagios. >> Did anyone maybe already write a check script i did not find yet? >> i found one via google, but it basically just did an ls -l on the GFS >> share, and that seems to be a little bit too less for monitoring.. >> thanks in advance, >> regards, >> johannes >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Red Hat Summit Boston | June 18-20, 2008 Learn more: http://www.redhat.com/summit -------------- next part -------------- A non-text attachment was scrubbed... Name: sghosh.vcf Type: text/x-vcard Size: 266 bytes Desc: not available URL: From tiagocruz at forumgdh.net Fri Apr 4 19:04:49 2008 From: tiagocruz at forumgdh.net (Tiago Cruz) Date: Fri, 04 Apr 2008 16:04:49 -0300 Subject: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? In-Reply-To: References: Message-ID: <1207335889.19045.37.camel@tuxkiller.ig.com.br> Hum.... so, how can I increase/grow one filesystem with GFS? :-) Its stable and secure does this? Many thanks On Fri, 2008-04-04 at 09:50 -0600, Gary Romo wrote: > > You cannot shrink/reduce the size of a GFS file system. > > Gary Romo > IBM Global Technology Services > 303.458.4415 > Email: garromo at us.ibm.com > Pager:1.877.552.9264 > Text message: gromo at skytel.com > > > Tiago Cruz > > Sent by: > linux-cluster-bounces at redhat.com > > 04/04/2008 08:46 AM > Please respond to > linux clustering > > > > > > To > linux clustering > > cc > > Subject > [Linux-cluster] > Can I shrink/grow > one GFS{12} > FileSystem? > > > > > > > > > If I'm using one LVM device, export it using GNDB, and format using > GFS > (1 or 2).... I will can add or remove some GB on this filesystem? > > Thanks > -- > Tiago Cruz > http://everlinux.com > Linux User #282636 > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Tiago Cruz http://everlinux.com Linux User #282636 From Derek.Anderson at compellent.com Fri Apr 4 19:22:56 2008 From: Derek.Anderson at compellent.com (Derek Anderson) Date: Fri, 4 Apr 2008 14:22:56 -0500 Subject: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? In-Reply-To: <1207335889.19045.37.camel@tuxkiller.ig.com.br> References: <1207335889.19045.37.camel@tuxkiller.ig.com.br> Message-ID: <99E0F1976E2DA2499F3E6EB18B25F036040C7974@honeywheat.Beer.Town> Tiago, You can grow the filesystem with gfs_grow, once the underlying device has been expanded. See gfs_grow(8) for more information. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Tiago Cruz Sent: Friday, April 04, 2008 2:05 PM To: linux clustering Cc: linux-cluster-bounces at redhat.com Subject: Re: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? Hum.... so, how can I increase/grow one filesystem with GFS? :-) Its stable and secure does this? Many thanks On Fri, 2008-04-04 at 09:50 -0600, Gary Romo wrote: > > You cannot shrink/reduce the size of a GFS file system. > > Gary Romo > IBM Global Technology Services > 303.458.4415 > Email: garromo at us.ibm.com > Pager:1.877.552.9264 > Text message: gromo at skytel.com > > > Tiago Cruz > > Sent by: > linux-cluster-bounces at redhat.com > > 04/04/2008 08:46 AM > Please respond to > linux clustering > > > > > > To > linux clustering > > cc > > Subject > [Linux-cluster] > Can I shrink/grow > one GFS{12} > FileSystem? > > > > > > > > > If I'm using one LVM device, export it using GNDB, and format using > GFS > (1 or 2).... I will can add or remove some GB on this filesystem? > > Thanks > -- > Tiago Cruz > http://everlinux.com > Linux User #282636 > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Tiago Cruz http://everlinux.com Linux User #282636 -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From chawkins at veracitynetworks.com Fri Apr 4 19:31:45 2008 From: chawkins at veracitynetworks.com (Christopher Hawkins) Date: Fri, 4 Apr 2008 15:31:45 -0400 Subject: [Linux-cluster] iptables rules for LVS-DR cluster In-Reply-To: <200804041903.m34J3UQX017586@msgtransport04.qualcomm.com> Message-ID: <200804041931.m34JVYad009708@mail2.ontariocreditcorp.com> Never had to load balance it myself, but have heard of FTP over LVS issues due to lack of persistence (make sure it's on) and due to port 21 and 20 getting sent to different servers. The solution was to remove port 20 from LVS. With LVS NAT there is a special FTP module you can load, but it should not be required in LVS DR. Or are you sure the issue is iptables? Also I would suggest the LVS mailing list if someone here can't solve this quickly. ;-) -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of John Garrity Sent: Friday, April 04, 2008 3:03 PM To: linux clustering Subject: [Linux-cluster] iptables rules for LVS-DR cluster I'm trying to get ftp working in a LVS DR cluster. I think it's the iptables rules that might be giving me a problem. I have http services working well. Can someone who has ftp working share their ip tables rules? I'm new at this so please go easy on me. Thanks! -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From garromo at us.ibm.com Fri Apr 4 19:35:42 2008 From: garromo at us.ibm.com (Gary Romo) Date: Fri, 4 Apr 2008 13:35:42 -0600 Subject: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? In-Reply-To: <1207335889.19045.37.camel@tuxkiller.ig.com.br> Message-ID: extend the VG: vgextend extend the LV: lvextend Grow the GFS file system: gfs_grow -v device Consider if space is also needed for additional journals. Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com Tiago Cruz Sent by: linux-cluster-bounces at redhat.com 04/04/2008 01:04 PM Please respond to linux clustering To linux clustering cc linux-cluster-bounces at redhat.com Subject Re: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? Hum.... so, how can I increase/grow one filesystem with GFS? :-) Its stable and secure does this? Many thanks On Fri, 2008-04-04 at 09:50 -0600, Gary Romo wrote: > > You cannot shrink/reduce the size of a GFS file system. > > Gary Romo > IBM Global Technology Services > 303.458.4415 > Email: garromo at us.ibm.com > Pager:1.877.552.9264 > Text message: gromo at skytel.com > > > Tiago Cruz > > Sent by: > linux-cluster-bounces at redhat.com > > 04/04/2008 08:46 AM > Please respond to > linux clustering > > > > > > To > linux clustering > > cc > > Subject > [Linux-cluster] > Can I shrink/grow > one GFS{12} > FileSystem? > > > > > > > > > If I'm using one LVM device, export it using GNDB, and format using > GFS > (1 or 2).... I will can add or remove some GB on this filesystem? > > Thanks > -- > Tiago Cruz > http://everlinux.com > Linux User #282636 > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Tiago Cruz http://everlinux.com Linux User #282636 -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From tiagocruz at forumgdh.net Fri Apr 4 19:59:15 2008 From: tiagocruz at forumgdh.net (Tiago Cruz) Date: Fri, 04 Apr 2008 16:59:15 -0300 Subject: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? In-Reply-To: <99E0F1976E2DA2499F3E6EB18B25F036040C7974@honeywheat.Beer.Town> References: <1207335889.19045.37.camel@tuxkiller.ig.com.br> <99E0F1976E2DA2499F3E6EB18B25F036040C7974@honeywheat.Beer.Town> Message-ID: <1207339155.19045.48.camel@tuxkiller.ig.com.br> ?Anderson and ?Gary, Many thanks for your attention! I've did one test here, and works perfectly! I just have one problem, because I need to restart my gnbd_serv, and re-export the GNDB device for nodes, because I've got this error messages: Apr 4 16:40:15 xen-7 gnbd_serv[23817]: ERROR size of the exported file /dev/Vol_LVM/mycluster has changed, aborting Apr 4 16:40:15 xen-7 gnbd_serv[23817]: server process 2956 exited because of signal 11 Apr 4 16:40:15 xen-7 kernel: gnbd_serv[2956]: segfault at 000000000000000c rip 0000000000405ab0 rsp 00007fff36dcb450 error 4 Apr 4 16:41:10 xen-7 gnbd_serv[2970]: startup succeeded Apr 4 16:41:17 xen-7 gnbd_serv[2970]: got local command 0x1 Apr 4 16:41:17 xen-7 gnbd_serv[2970]: gnbd device 'cluster' serving /dev/Vol_LVM/mycluster exported with 41943040 sectors But I did another test, and this time I've just "restart" the export using: # gnbd_export -R -O # gnbd_export -c -d /dev/Vol_LVM/mycluster -e cluster And sounds like fine on the nodes... but I don't know if this process it's recommend, or if this force (-O Force unexport) can be dangerous for the filesystem... Thanks On Fri, 2008-04-04 at 14:22 -0500, Derek Anderson wrote: > You can grow the filesystem with gfs_grow, once the underlying device > has been expanded. See gfs_grow(8) for more information. From Derek.Anderson at compellent.com Fri Apr 4 20:36:13 2008 From: Derek.Anderson at compellent.com (Derek Anderson) Date: Fri, 4 Apr 2008 15:36:13 -0500 Subject: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? In-Reply-To: <1207339155.19045.48.camel@tuxkiller.ig.com.br> References: <1207335889.19045.37.camel@tuxkiller.ig.com.br><99E0F1976E2DA2499F3E6EB18B25F036040C7974@honeywheat.Beer.Town> <1207339155.19045.48.camel@tuxkiller.ig.com.br> Message-ID: <99E0F1976E2DA2499F3E6EB18B25F036040C7AFD@honeywheat.Beer.Town> Sorry, I missed the part about you using GNBD. It's been awhile, and I haven't tested it, but I think the safest procedure might be: 1. Unmount from gnbd clients. 2. Un-import from gnbd clients. 3. Un-export gnbd devices from the server _without_ the Override option. 4. Extend the VG and LV. 5. Re-export the gnbd device from the server. 6. Re-import gnbd devices from clients. 7. Re-mount gnbd devices on clients. 8. Run gfs_grow from one client. Hopefully, gnbd_serv won't complain about the device size changing underneath it if that device is not currently exported. If it does you will probably need to stop and restart it around step 4. Overriding the unexport of gnbd devices can be hazardous. See the warning in gnbd_export(8). Good luck: - Derek -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Tiago Cruz Sent: Friday, April 04, 2008 2:59 PM To: linux clustering Subject: RE: [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? Anderson and ?Gary, Many thanks for your attention! I've did one test here, and works perfectly! I just have one problem, because I need to restart my gnbd_serv, and re-export the GNDB device for nodes, because I've got this error messages: Apr 4 16:40:15 xen-7 gnbd_serv[23817]: ERROR size of the exported file /dev/Vol_LVM/mycluster has changed, aborting Apr 4 16:40:15 xen-7 gnbd_serv[23817]: server process 2956 exited because of signal 11 Apr 4 16:40:15 xen-7 kernel: gnbd_serv[2956]: segfault at 000000000000000c rip 0000000000405ab0 rsp 00007fff36dcb450 error 4 Apr 4 16:41:10 xen-7 gnbd_serv[2970]: startup succeeded Apr 4 16:41:17 xen-7 gnbd_serv[2970]: got local command 0x1 Apr 4 16:41:17 xen-7 gnbd_serv[2970]: gnbd device 'cluster' serving /dev/Vol_LVM/mycluster exported with 41943040 sectors But I did another test, and this time I've just "restart" the export using: # gnbd_export -R -O # gnbd_export -c -d /dev/Vol_LVM/mycluster -e cluster And sounds like fine on the nodes... but I don't know if this process it's recommend, or if this force (-O Force unexport) can be dangerous for the filesystem... Thanks On Fri, 2008-04-04 at 14:22 -0500, Derek Anderson wrote: > You can grow the filesystem with gfs_grow, once the underlying device > has been expanded. See gfs_grow(8) for more information. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From linux-cluster at merctech.com Fri Apr 4 22:41:56 2008 From: linux-cluster at merctech.com (linux-cluster at merctech.com) Date: Fri, 04 Apr 2008 18:41:56 -0400 Subject: [Linux-cluster] anyone modified fence_mcdata to use ssh instead of telnet? Message-ID: <29376.1207348916@localhost> Telnet is fundamentally insecure. We've known this for about 20 years. Finally, network switches, fibre switches, appliances, etc., have begun to recognize this truth. For example, the McData fibre switches give you the choice of telnet (evil) or ssh (good). Note that this is a choice between them...you cannot have both protocols enabled at once (at least not with the switch hardware and firmware rev I'm using). So, like a good sysadmin, I enable ssh on my McData Sphereon 4400. I can ssh into the switch and configure it via the command line. Happiness. Unfortunately, the fence_mcdata script assumes that the only way to connect to the switch is via (evil) telnet. Before I start hacking the fence_mcdata script...has anyone already modified this to make it more secure? If not, this would be a simple product enhancement (hint, hint). Thanks, Mark From johannes.russek at io-consulting.net Sat Apr 5 00:20:38 2008 From: johannes.russek at io-consulting.net (Johannes Russek) Date: Sat, 05 Apr 2008 02:20:38 +0200 Subject: [Linux-cluster] iptables rules for LVS-DR cluster In-Reply-To: <200804041931.m34JVYad009708@mail2.ontariocreditcorp.com> References: <200804041931.m34JVYad009708@mail2.ontariocreditcorp.com> Message-ID: <47F6C5D6.8030502@io-consulting.net> we use this together with firewall mark rule in lvs-DR (piranha) and scheduler "rr" and persistent = 20: -A PREROUTING -d $VIP-i eth0 -p tcp -m tcp --dport 10000:20000 -j MARK --set-mark 0x14 -A PREROUTING -d $VIP -i eth0 -p tcp -m tcp --dport 20 -j MARK --set-mark 0x14 -A PREROUTING -d $VIP -i eth0 -p tcp -m tcp --dport 21 -j MARK --set-mark 0x14 also vsftpd.conf is configured with pasv_min_port=10000 pasv_max_port=20000 hope this helps? regards, johannes p.s.: of course the main firewall has to open the appropiate ports as well Christopher Hawkins schrieb: > Never had to load balance it myself, but have heard of FTP over LVS issues > due to lack of persistence (make sure it's on) and due to port 21 and 20 > getting sent to different servers. The solution was to remove port 20 from > LVS. With LVS NAT there is a special FTP module you can load, but it should > not be required in LVS DR. Or are you sure the issue is iptables? > > Also I would suggest the LVS mailing list if someone here can't solve this > quickly. ;-) > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of John Garrity > Sent: Friday, April 04, 2008 3:03 PM > To: linux clustering > Subject: [Linux-cluster] iptables rules for LVS-DR cluster > > I'm trying to get ftp working in a LVS DR cluster. I think it's the iptables > rules that might be giving me a problem. I have http services working well. > Can someone who has ftp working share their ip tables rules? I'm new at this > so please go easy on me. Thanks! > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jan.gerrit at kootstra.org.uk Sat Apr 5 07:21:20 2008 From: jan.gerrit at kootstra.org.uk (Jan Gerrit Kootstra) Date: Sat, 05 Apr 2008 09:21:20 +0200 Subject: [Linux-cluster] Linux-cluster] Re: Can I shrink/grow one GFS{12} FileSystem? Message-ID: <47F72870.8000308@kootstra.org.uk> You cannot shrink/reduce the size of a GFS file system. Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com Tiago Cruz Sent by: linux-cluster-bounces at redhat.com 04/04/2008 08:46 AM Please respond to linux clustering To linux clustering cc Subject [Linux-cluster] Can I shrink/grow one GFS{12} FileSystem? If I'm using one LVM device, export it using GNDB, and format using GFS (1 or 2).... I will can add or remove some GB on this filesystem? Thanks -- Tiago Cruz http://everlinux.com Linux User #282636 -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster Gary, You are right about reducing/shrinking a GFS filesystem. Tiago also asked about expanding/growing, for GFS(2) this can be done with gfs(2)_grow. Both commands run only on mounted file systems. Kind regards, Jan Gerrit Kootstra -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgarrity at qualcomm.com Sat Apr 5 18:04:33 2008 From: jgarrity at qualcomm.com (John Garrity) Date: Sat, 05 Apr 2008 11:04:33 -0700 Subject: [Linux-cluster] iptables rules for LVS-DR cluster In-Reply-To: <47F6C5D6.8030502@io-consulting.net> References: <200804041931.m34JVYad009708@mail2.ontariocreditcorp.com> <47F6C5D6.8030502@io-consulting.net> Message-ID: <200804051804.m35I4XDs019407@hamtaro.qualcomm.com> Question: how did you set the scheduler to "n"? I don't see a choice for "none" in Piranha and I tried manually editing /etc/sysconfig/ha/lvs.cf with no luck. Even when I commented out the scheduler field it seems to default to wlc. Basically, I'm not sure that it's my iptables rules that are giving me a problem. Maybe it's what Christopher mentions below? How would I remove port 20 from LVS? I tried using a firewall mark of 20 and have Piranha configured to use 21 as the application port. I can ftp to the real servers using their real IPs but ftps to the VIP fail with the error on the ftp client "An existing connection was forcibly closed by the remote host." Persistence is set to 20 Here are the iptables rules I'm using # service iptables status Table: mangle Chain PREROUTING (policy ACCEPT) num target prot opt source destination 1 MARK tcp -- 0.0.0.0/0 VIP tcp dpts:10000:20000 MARK set 0x14 2 MARK tcp -- 0.0.0.0/0 VIP tcp dpt:20 MARK set 0x14 3 MARK tcp -- 0.0.0.0/0 VIP tcp dpt:21 MARK set 0x14 Chain INPUT (policy ACCEPT) num target prot opt source destination Chain FORWARD (policy ACCEPT) num target prot opt source destination Chain OUTPUT (policy ACCEPT) num target prot opt source destination Chain POSTROUTING (policy ACCEPT) num target prot opt source destination Table: filter Chain INPUT (policy ACCEPT) num target prot opt source destination 1 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0 tcp spts:1:65535 dpts:1:65535 Chain FORWARD (policy ACCEPT) num target prot opt source destination Chain OUTPUT (policy ACCEPT) num target prot opt source destination At 05:20 PM 4/4/2008, Johannes Russek wrote: >we use this together with firewall mark rule in lvs-DR (piranha) and scheduler "rr" and persistent = 20: > >-A PREROUTING -d $VIP-i eth0 -p tcp -m tcp --dport 10000:20000 -j MARK --set-mark 0x14 >-A PREROUTING -d $VIP -i eth0 -p tcp -m tcp --dport 20 -j MARK --set-mark 0x14 >-A PREROUTING -d $VIP -i eth0 -p tcp -m tcp --dport 21 -j MARK --set-mark 0x14 > >also vsftpd.conf is configured with > >pasv_min_port=10000 >pasv_max_port=20000 > >hope this helps? >regards, >johannes > >p.s.: of course the main firewall has to open the appropiate ports as well > >Christopher Hawkins schrieb: >>Never had to load balance it myself, but have heard of FTP over LVS issues >>due to lack of persistence (make sure it's on) and due to port 21 and 20 >>getting sent to different servers. The solution was to remove port 20 from >>LVS. With LVS NAT there is a special FTP module you can load, but it should >>not be required in LVS DR. Or are you sure the issue is iptables? >> >>Also I would suggest the LVS mailing list if someone here can't solve this >>quickly. ;-) >>-----Original Message----- >>From: linux-cluster-bounces at redhat.com >>[mailto:linux-cluster-bounces at redhat.com] On Behalf Of John Garrity >>Sent: Friday, April 04, 2008 3:03 PM >>To: linux clustering >>Subject: [Linux-cluster] iptables rules for LVS-DR cluster >> >>I'm trying to get ftp working in a LVS DR cluster. I think it's the iptables >>rules that might be giving me a problem. I have http services working well. >>Can someone who has ftp working share their ip tables rules? I'm new at this >>so please go easy on me. Thanks! >>-- >>Linux-cluster mailing list >>Linux-cluster at redhat.com >>https://www.redhat.com/mailman/listinfo/linux-cluster >> >>-- >>Linux-cluster mailing list >>Linux-cluster at redhat.com >>https://www.redhat.com/mailman/listinfo/linux-cluster >> > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster From johannes.russek at io-consulting.net Sun Apr 6 01:11:58 2008 From: johannes.russek at io-consulting.net (Johannes Russek) Date: Sun, 06 Apr 2008 03:11:58 +0200 Subject: [Linux-cluster] iptables rules for LVS-DR cluster In-Reply-To: <200804051804.m35I4XDs019407@hamtaro.qualcomm.com> References: <200804041931.m34JVYad009708@mail2.ontariocreditcorp.com> <47F6C5D6.8030502@io-consulting.net> <200804051804.m35I4XDs019407@hamtaro.qualcomm.com> Message-ID: <47F8235E.4000509@io-consulting.net> John Garrity schrieb: > Question: how did you set the scheduler to "n"? > i didn't. it's "rr", double-R for round-robin. > I don't see a choice for "none" in Piranha and I tried manually editing /etc/sysconfig/ha/lvs.cf with no luck. Even when I commented out the scheduler field it seems to default to wlc. > > Basically, I'm not sure that it's my iptables rules that are giving me a problem. Maybe it's what Christopher mentions below? How would I remove port 20 from LVS? > i don't think you have to do that with persistency. as i said, it works pretty good here. without much knowledge about your network, i would say it's an issue with the direct routing setup. i would suggest digging a little deeper into your network setup and checking tcpdump for the reason of the connection reset. (stateful filtering at the wrong point in the setup comes to mind). maybe you should ask at that LVS mailing list for help! good luck. johannes > I tried using a firewall mark of 20 and have Piranha configured to use 21 as the application port. I can ftp to the real servers using their real IPs but ftps to the VIP fail with the error on the ftp client "An existing connection was forcibly closed by the remote host." > > Persistence is set to 20 > > From jgarrity at qualcomm.com Sun Apr 6 21:08:26 2008 From: jgarrity at qualcomm.com (John Garrity) Date: Sun, 06 Apr 2008 14:08:26 -0700 Subject: [Linux-cluster] iptables rules for LVS-DR cluster In-Reply-To: <47F8235E.4000509@io-consulting.net> References: <200804041931.m34JVYad009708@mail2.ontariocreditcorp.com> <47F6C5D6.8030502@io-consulting.net> <200804051804.m35I4XDs019407@hamtaro.qualcomm.com> <47F8235E.4000509@io-consulting.net> Message-ID: <200804062108.m36L8RJZ010147@hamtaro.qualcomm.com> At 06:11 PM 4/5/2008, you wrote: it's "rr", double-R for round-robin. d'oh, that's what i get for not wearing my glasses! i don't think you have to do that with persistency. as i said, it works pretty good here. >without much knowledge about your network, i would say it's an issue with the direct routing setup. i would suggest digging a little deeper into your network setup and checking tcpdump for the reason of the connection reset. (stateful filtering at the wrong point in the setup comes to mind). yeah, the output from ipvsadm is good for http IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP l423-lvs.qualcomm.com:http rr -> l423-cn1.qualcomm.com:http Route 1 0 0 -> l423-cn2.qualcomm.com:http Route 2 0 0 FWM 20 rr persistent 20 but no good for ftp [root at l423-lb1 ~]# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 20 rr persistent 20 I signed up for the LVS mail list and will probably post there next week if I can't make any more progress on my own... >maybe you should ask at that LVS mailing list for help! >good luck. >johannes > >>I tried using a firewall mark of 20 and have Piranha configured to use 21 as the application port. I can ftp to the real servers using their real IPs but ftps to the VIP fail with the error on the ftp client "An existing connection was forcibly closed by the remote host." >> >>Persistence is set to 20 >> >> > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster From jakub.suchy at enlogit.cz Sun Apr 6 21:12:39 2008 From: jakub.suchy at enlogit.cz (Jakub Suchy) Date: Sun, 6 Apr 2008 23:12:39 +0200 Subject: [Linux-cluster] Virtual service without GFS Message-ID: <20080406211239.GC32651@localhost> Hi, is it possible to run a virtual service on a cluster (XEN host) without using GFS? I know I can create an ext3 partition, but it is not possible to add a resource to virtual service, so I can't join ext3 to it. Thanks, Jakub Suchy From maciej.bogucki at artegence.com Mon Apr 7 08:23:06 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Mon, 07 Apr 2008 10:23:06 +0200 Subject: [Linux-cluster] Nagios check In-Reply-To: <1207318543.15409.42.camel@admc.win-rar.local> References: <1207318543.15409.42.camel@admc.win-rar.local> Message-ID: <47F9D9EA.50006@artegence.com> jr napisa?(a): > Hi Everybody, > i wonder if i'm the first with the need to check the status of GFS / > cman with nagios. > Did anyone maybe already write a check script i did not find yet? > i found one via google, but it basically just did an ls -l on the GFS > share, and that seems to be a little bit too less for monitoring.. > thanks in advance, Here [1] is some tool to monitoring GFS. And below You have my own script. ---cut--- #!/bin/bash ok() { echo "OK - $*"; exit 0 } warning() { echo "WARNING - $*"; exit 1 } critical() { echo "CRITICAL - $*"; exit 2 } unknown() { echo "UNKNOWN - $*"; exit 3 } procfsf=/proc/cluster/services if [ ! -f $procfsf ] ; then critical "RHCS not running" fi procfss=$(cat /proc/cluster/services) check_clvmd=$(echo "$procfss"|grep "^DLM Lock Space"|grep "clvmd"|head -1|awk '{print $7}') check_dlm=$(echo "$procfss"|grep "^DLM Lock Space"|grep -v "clvmd"|head -1|awk '{print $7}') check_fenced=$(echo "$procfss"|grep "^Fence Domain"|head -1|awk '{print $6}') check_gfs=$(echo "$procfss"|grep "^GFS Mount Group"|head -1|awk '{print $7}') if [ -z "$check_clvmd" ] ; then critical "CLVM not running" fi if [ -z "$check_dlm" ] ; then critical "DLM not running" fi if [ -z "$check_fenced" ] ; then critical "FENCED not running" fi if [ -z "$check_gfs" ] ; then critical "GFS not running" fi if [ "$check_clvmd" != "run" ] ; then warning "CLVM in state $check_clvmd" fi if [ "$check_dlm" != "run" ] ; then warning "DLM in state $check_dlm" fi if [ "$check_fenced" != "run" ] ; then warning "FENCED in state $check_fenced" fi if [ "$check_gfs" != "run" ] ; then warning "GFS in state $check_gfs" fi gfs_res=$(echo "$procfss"|grep "^GFS Mount Group"|awk '{print $4}'|xargs echo) if [ -z "$gfs_res" ] ; then critical "RHCS is running without any active resources" fi ok "RHCS is running ($gfs_res)" ---cut--- [1] - http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed%2F2442.html;d=1 Best Regards Maciej Bogucki From johannes.russek at io-consulting.net Mon Apr 7 09:32:25 2008 From: johannes.russek at io-consulting.net (jr) Date: Mon, 07 Apr 2008 11:32:25 +0200 Subject: [Linux-cluster] Nagios check In-Reply-To: <47F9D9EA.50006@artegence.com> References: <1207318543.15409.42.camel@admc.win-rar.local> <47F9D9EA.50006@artegence.com> Message-ID: <1207560745.15409.54.camel@admc.win-rar.local> Hi Maciej, thanks a lot for your script. If you look at [1], i'm the guy that had commented on that ;) I don't seem to have /proc/cluster? is that a rhel4 specific thing maybe? or do i need to load something first? johannes Am Montag, den 07.04.2008, 10:23 +0200 schrieb Maciej Bogucki: > jr napisa?(a): > > Hi Everybody, > > i wonder if i'm the first with the need to check the status of GFS / > > cman with nagios. > > Did anyone maybe already write a check script i did not find yet? > > i found one via google, but it basically just did an ls -l on the GFS > > share, and that seems to be a little bit too less for monitoring.. > > thanks in advance, > > Here [1] is some tool to monitoring GFS. And below You have my own script. > > ---cut--- > #!/bin/bash > > ok() { > echo "OK - $*"; exit 0 > } > warning() { > echo "WARNING - $*"; exit 1 > } > critical() { > echo "CRITICAL - $*"; exit 2 > } > unknown() { > echo "UNKNOWN - $*"; exit 3 > } > > procfsf=/proc/cluster/services > > if [ ! -f $procfsf ] ; then > critical "RHCS not running" > fi > > procfss=$(cat /proc/cluster/services) > check_clvmd=$(echo "$procfss"|grep "^DLM Lock Space"|grep "clvmd"|head > -1|awk '{print $7}') > check_dlm=$(echo "$procfss"|grep "^DLM Lock Space"|grep -v "clvmd"|head > -1|awk '{print $7}') > check_fenced=$(echo "$procfss"|grep "^Fence Domain"|head -1|awk '{print > $6}') > check_gfs=$(echo "$procfss"|grep "^GFS Mount Group"|head -1|awk '{print > $7}') > > if [ -z "$check_clvmd" ] ; then > critical "CLVM not running" > fi > > if [ -z "$check_dlm" ] ; then > critical "DLM not running" > fi > > if [ -z "$check_fenced" ] ; then > critical "FENCED not running" > fi > > if [ -z "$check_gfs" ] ; then > critical "GFS not running" > fi > > if [ "$check_clvmd" != "run" ] ; then > warning "CLVM in state $check_clvmd" > fi > > if [ "$check_dlm" != "run" ] ; then > warning "DLM in state $check_dlm" > fi > > if [ "$check_fenced" != "run" ] ; then > warning "FENCED in state $check_fenced" > fi > > if [ "$check_gfs" != "run" ] ; then > warning "GFS in state $check_gfs" > fi > > gfs_res=$(echo "$procfss"|grep "^GFS Mount Group"|awk '{print $4}'|xargs > echo) > > if [ -z "$gfs_res" ] ; then > critical "RHCS is running without any active resources" > fi > > ok "RHCS is running ($gfs_res)" > > ---cut--- > > [1] - > http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed%2F2442.html;d=1 > > > Best Regards > Maciej Bogucki > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From maciej.bogucki at artegence.com Mon Apr 7 09:49:41 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Mon, 07 Apr 2008 11:49:41 +0200 Subject: [Linux-cluster] Nagios check In-Reply-To: <1207560745.15409.54.camel@admc.win-rar.local> References: <1207318543.15409.42.camel@admc.win-rar.local> <47F9D9EA.50006@artegence.com> <1207560745.15409.54.camel@admc.win-rar.local> Message-ID: <47F9EE35.5030204@artegence.com> jr napisa?(a): > Hi Maciej, > thanks a lot for your script. If you look at [1], i'm the guy that had > commented on that ;) :))) > I don't seem to have /proc/cluster? is that a rhel4 specific thing > maybe? or do i need to load something first? Yes, it is for rhel4. I don't have any rhel5 with GFS, so I can't help You. Best Regards Maciej Bogucki From maciej.bogucki at artegence.com Mon Apr 7 10:01:09 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Mon, 07 Apr 2008 12:01:09 +0200 Subject: [Linux-cluster] Nagios check In-Reply-To: <47F9EE35.5030204@artegence.com> References: <1207318543.15409.42.camel@admc.win-rar.local> <47F9D9EA.50006@artegence.com> <1207560745.15409.54.camel@admc.win-rar.local> <47F9EE35.5030204@artegence.com> Message-ID: <47F9F0E5.9010308@artegence.com> >> thanks a lot for your script. If you look at [1], i'm the guy that had >> commented on that ;) > :))) > >> I don't seem to have /proc/cluster? is that a rhel4 specific thing >> maybe? or do i need to load something first? > > Yes, it is for rhel4. I don't have any rhel5 with GFS, so I can't help You. Hello, Here is the patch, now It should work with rhel5. --- check_rhcs.original 2006-12-22 13:47:36.000000000 +0100 +++ check_rhcs 2008-04-07 11:59:23.000000000 +0200 @@ -27,13 +27,13 @@ echo "UNKNOWN - $*"; exit 3 } -procfsf=/proc/cluster/services +#procfsf="cman_tool services" if [ ! -f $procfsf ] ; then critical "RHCS not running" fi -procfss=$(cat /proc/cluster/services) +procfss=`cman_tool services` check_clvmd=$(echo "$procfss"|grep "^DLM Lock Space"|grep "clvmd"|head -1|awk '{print $7}') check_dlm=$(echo "$procfss"|grep "^DLM Lock Space"|grep -v "clvmd"|head -1|awk '{print $7}') check_fenced=$(echo "$procfss"|grep "^Fence Domain"|head -1|awk '{print $6}') Best Regards Maciej Bogucki From teemu.m2 at luukku.com Mon Apr 7 10:29:11 2008 From: teemu.m2 at luukku.com (m.. mm..) Date: Mon, 7 Apr 2008 13:29:11 +0300 (EEST) Subject: [Linux-cluster] SCSI-fence configure Message-ID: <1207564151655.teemu.m2.27811.rpZq244jT_oNCRlQjmOPCA@luukku.com> Hi Ryan or somebody else.. I have one question about your documentation about RedHat and scsi_reservation fence what you have write. About this Storage Requimenents: You write like this. "all shared storage must use LVM2 cluster volumes" If i have 2 cluster-nodes and shared /data1 mount which are this shared volume, with no scsi-resevation bit on, in active-passive mode service. And i want SCSI-fence like: I configure 2 * 2Gb shared storage more with SCSI-3 reservation bit on and create lvm2-partition, should this fence work, or must i still convert this allready configured /data1 to lvm2, for not preventing data-corruption in some situations. Or can i leave this partition unchanged ------------cut starts:-------------- 3.2 - Storage Requirements In order to use SCSI persistent reservations as a fencing method, all shared storage must use LVM2 cluster volumes. In addition, all devices within these volumes must be SPC-3 compliant. If you are unsure if your cluster and shared storage environment meets these requirements, a script is available to determine if your shared storage devices are capable of using SCSI persistent reservations. See section x.x. ------------ cut ends------------------------------- ................................................................... Luukku Plus paketilla p??set eroon tila- ja turvallisuusongelmista. Hanki Luukku Plus ja helpotat el?m??si. http://www.mtv3.fi/luukku From maciej.bogucki at artegence.com Mon Apr 7 12:07:59 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Mon, 07 Apr 2008 14:07:59 +0200 Subject: [Linux-cluster] SCSI-fence configure In-Reply-To: <1207564151655.teemu.m2.27811.rpZq244jT_oNCRlQjmOPCA@luukku.com> References: <1207564151655.teemu.m2.27811.rpZq244jT_oNCRlQjmOPCA@luukku.com> Message-ID: <47FA0E9F.5010302@artegence.com> m.. mm.. napisa?(a): > Hi Ryan or somebody else.. > > I have one question about your documentation about RedHat and scsi_reservation fence what you have write. > > About this Storage Requimenents: > You write like this. "all shared storage must use LVM2 cluster volumes" > If i have 2 cluster-nodes and shared /data1 mount which are this shared volume, with no scsi-resevation bit on, in active-passive mode service. > And i want SCSI-fence like: I configure 2 * 2Gb shared storage more with SCSI-3 reservation bit on and create lvm2-partition, should this fence work, or must i still convert this allready configured /data1 to lvm2, for not preventing data-corruption in some situations. Or can i leave this partition unchanged Hello, As Ryan said, actual version of fence_scsi works only with LVM2. It is only bash script, so You could change it, if You want. I can't understand the term "active-passive mode service"? Do you use GFS filesystem? Best Regards Maciej Bogucki From Alain.Moulle at bull.net Mon Apr 7 13:00:23 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 07 Apr 2008 15:00:23 +0200 Subject: [Linux-cluster] timers tuning (contd) Message-ID: <47FA1AE7.2050600@bull.net> Hi Is there a similar rule with CS5 ? I mean if we increase the heart-beat timeout, is there some other parameters to adjust together ? Thanks Regards Alain Moull? Alain Moulle wrote: >>>> Hi >>>> >>>> is there a rule to follow between the DLM lock_timeout >>>> and the deadnode_timeout value ? >>>> Meaning for example that the first one must be always lesser than >>>> the second one ? >>>> >>>> And if so, could we have a deadnode_timeout=60s and the >>>> /proc/cluster/config/dlm/lock_timeout at 70s ? or are >>>> there some upper limits not to exceed ? The DLM's lock_timeout should always be greater than cman's deadnode_timeout. A sensible minimum is about 1.5 times the cman value, but it can go as high as you like. -- Chrissie From rohara at redhat.com Mon Apr 7 15:09:03 2008 From: rohara at redhat.com (Ryan O'Hara) Date: Mon, 07 Apr 2008 10:09:03 -0500 Subject: [Linux-cluster] SCSI-fence configure In-Reply-To: <1207564151655.teemu.m2.27811.rpZq244jT_oNCRlQjmOPCA@luukku.com> References: <1207564151655.teemu.m2.27811.rpZq244jT_oNCRlQjmOPCA@luukku.com> Message-ID: <47FA390F.7060707@redhat.com> m.. mm.. wrote: > Hi Ryan or somebody else.. > > I have one question about your documentation about RedHat and scsi_reservation fence what you have write. > > About this Storage Requimenents: > You write like this. "all shared storage must use LVM2 cluster volumes" > If i have 2 cluster-nodes and shared /data1 mount which are this shared volume, with no scsi-resevation bit on, in active-passive mode service. > And i want SCSI-fence like: I configure 2 * 2Gb shared storage more with SCSI-3 reservation bit on and create lvm2-partition, should this fence work, or must i still convert this allready configured /data1 to lvm2, for not preventing data-corruption in some situations. Or can i leave this partition unchanged This means you must use LVM2 to setup your shared storage. In other words, you must be using clvmd (lvm2-cluster package). Also note that active-passive arrays are not officially support. I have tested as active-passive array with RHEL5 and it seems to work due to a change in device-mapper-multipath. The reason is that SCSI reservations work by using an ioctl call to the devices and both the active and passive paths must get this ioctl (to create the registration and/or reservation). RHEL5 appears to include a fix that passes this ioctl to all devices. This is not present in RHEL4. I believe the documentation states that active-passive (multipath) arrays are not supported. I will update this when I do more testing. I'm not sure what you mean by "no scsi-resevation bit on". Can you explain? -Ryan > ------------cut starts:-------------- > 3.2 - Storage Requirements > > In order to use SCSI persistent reservations as a fencing method, all > shared storage must use LVM2 cluster volumes. In addition, all devices > within these volumes must be SPC-3 compliant. If you are unsure if > your cluster and shared storage environment meets these requirements, > a script is available to determine if your shared storage devices are > capable of using SCSI persistent reservations. See section x.x. > > ------------ cut ends------------------------------- > > > > > ................................................................... > Luukku Plus paketilla p??set eroon tila- ja turvallisuusongelmista. > Hanki Luukku Plus ja helpotat el?m??si. http://www.mtv3.fi/luukku > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lhh at redhat.com Mon Apr 7 21:23:29 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 07 Apr 2008 17:23:29 -0400 Subject: [Linux-cluster] timers tuning (contd) In-Reply-To: <47FA1AE7.2050600@bull.net> References: <47FA1AE7.2050600@bull.net> Message-ID: <1207603409.2927.34.camel@localhost.localdomain> On Mon, 2008-04-07 at 15:00 +0200, Alain Moulle wrote: > Hi > > Is there a similar rule with CS5 ? I mean if we > increase the heart-beat timeout, is there some > other parameters to adjust together ? qdisk timeout should be about a hair more than cman's timeout, if you're using it. -- Lon From lhh at redhat.com Mon Apr 7 21:25:38 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 07 Apr 2008 17:25:38 -0400 Subject: [Linux-cluster] Virtual service without GFS In-Reply-To: <20080406211239.GC32651@localhost> References: <20080406211239.GC32651@localhost> Message-ID: <1207603538.2927.37.camel@localhost.localdomain> On Sun, 2008-04-06 at 23:12 +0200, Jakub Suchy wrote: > Hi, > is it possible to run a virtual service on a cluster (XEN host) without > using GFS? I know I can create an ext3 partition, but it is not possible > to add a resource to virtual service, so I can't join ext3 to it. Yes, but ... * the same rules apply as if you were using GFS * The storage used by the Xen virtual machine must be accessible from all cluster members. * fencing is required For example, you can store the guest images on raw SAN partitions without using GFS if you wish. -- Lon From Christopher.Barry at qlogic.com Tue Apr 8 02:36:46 2008 From: Christopher.Barry at qlogic.com (christopher barry) Date: Mon, 07 Apr 2008 22:36:46 -0400 Subject: [Linux-cluster] dlm and IO speed problem Message-ID: <1207622206.5259.66.camel@localhost> Hi everyone, I have a couple of questions about the tuning the dlm and gfs that hopefully someone can help me with. my setup: 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's not new stuff, but corporate standards dictated the rev of rhat. The cluster is a developer build cluster, where developers login, and are balanced across nodes and edit and compile code. They can access via vnc, XDMCP, ssh and telnet, and nodes external to the cluster can mount the gfs home via nfs, balanced through the director. Their homes are on the gfs, and accessible on all nodes. I'm noticing huge differences in compile times - or any home file access really - when doing stuff in the same home directory on the gfs on different nodes. For instance, the same compile on one node is ~12 minutes - on another it's 18 minutes or more (not running concurrently). I'm also seeing weird random pauses in writes, like saving a file in vi, what would normally take less than a second, may take up to 10 seconds. * From reading, I see that the first node to access a directory will be the lock master for that directory. How long is that node the master? If the user is no longer 'on' that node, is it still the master? If continued accesses are remote, will the master state migrate to the node that is primarily accessing it? I've set LVS persistence for ssh and telnet for 5 minutes, to allow multiple xterms fired up in a script to land on the same node, but new ones later will land on a different node - by design really. Do I need to make this persistence way longer to keep people only on the first node they hit? That kind of horks my load balancing design if so. How can I see which node is master for which directories? Is there a table I can read somehow? * I've bumped the wake times for gfs_scand and gfs_inoded to 30 secs, I mount noatime,noquota,nodiratime, and David Teigland recommended I set dlm_dropcount to '0' today on irc, which I did, and I see an improvement in speed on the node that appears to be master for say 'find' command runs on the second and subsequent runs of the command if I restart them immediately, but on the other nodes the speed is awful - worse than nfs would be. On the first run of a find, or If I wait >10 seconds to start another run after the last run completes, the time to run is unbelievably slower than the same command on a standalone box with ext3. e.g. <9 secs on the standalone, compared to 46 secs on the cluster - on a different node it can take over 2 minutes! Yet an immediate re-run on the cluster, on what I think must be the master is sub-second. How can I speed up the first access time, and how can I keep the speed up similar to immediate subsequent runs. I've got a ton of memory - I just do not know which knobs to turn. Am I expecting too much from gfs? Did I oversell it when I literally fought to use it rather than nfs off the NetApp filer, insisting that the performance of gfs smoked nfs? Or, more likely, do I just not understand how to optimize it fully for my application? Regards and Thanks, -C From bevan.broun at ardec.com.au Tue Apr 8 04:31:31 2008 From: bevan.broun at ardec.com.au (Bevan Broun) Date: Tue, 8 Apr 2008 13:31:31 +0900 (EIT) Subject: [Linux-cluster] strange requirements:non reboot of failed node, both shared and non-shared storage on SAN. In-Reply-To: <1207622206.5259.66.camel@localhost> References: <1207622206.5259.66.camel@localhost> Message-ID: <22207.210.9.69.226.1207629091.squirrel@webmail.ardec.com.au> Hi All I have a strange set of requirements: A two node cluster: services running on cluster nodes are not shared (ie not clustered). cluster is only there for two GFS file systems on a SAN. The same storage system hosts non GFS luns for individual use by the cluster members. The nodes run two applications, the critical app does NOT use the GFS. The non critical ap uses the GFS. The critical application uses storage from the SAN for ext3 file systems. The requirement is that a failure of the cluster should not interupt the critical application. This means the failed node cannot be power cycled. Also the failed node must continue to have access to it's non GFS luns on the storage. The Storage are two HP EVAs. Each EVA has two controllers. There are two brocade FC switches. Fencing is required for GFS. The only solution I can think of is: GFS LUNs presented down one HBA only, while ext3 luns are presented down both. Use SAN fencing to block access by the fenced host to GFS luns by blocking access to the controller that is handling this LUN. repairing the cluster will be a manual operation that may involve a reboot. does this look workable? Thanks From mathieu.avila at seanodes.com Tue Apr 8 08:47:53 2008 From: mathieu.avila at seanodes.com (Mathieu Avila) Date: Tue, 8 Apr 2008 10:47:53 +0200 Subject: [Linux-cluster] About GFS1 and I/O barriers. In-Reply-To: <1207149428.3635.151.camel@quoit> References: <20080328153458.45fc6e13@mathieu.toulouse> <20080331124651.3f0d2428@mathieu.toulouse> <1206960860.3635.126.camel@quoit> <20080331151622.1360a2cb@mathieu.toulouse> <1207130014.3310.24.camel@localhost.localdomain> <1a2a6dd60804020726g20d77419k47298eb000c431ec@mail.gmail.com> <1207149428.3635.151.camel@quoit> Message-ID: <20080408104753.14db9ed3@mathieu.toulouse> Le Wed, 02 Apr 2008 16:17:08 +0100, Steven Whitehouse a ?crit : > Hi, > > > If the data is not physically on disk when the ACK it sent back, then > there is no way for the fs to know whether the data has (at a later > date) not been written due to some error or other. Even ignoring that > for the moment and assuming that such errors never occur, I don't > think its too unreasonable to expect at a minimum that all > acknowledged I/O will never be reordered with unacknowledged I/O. > That is all that is required for correct operation of gfs1/2 provided > that no media errors occur on write. If I understand correctly your statement, I think you misinterpret what a ACK on write means. For the SCSI protocol, ACKing a write doesn't mean it has reached the platters. >From here: http://t10.org/ftp/t10/drafts/sbc3/sbc3r14.pdf 4.11 Caches - 5th paragraph " During write operations, the device server uses the cache to store data that is to be written to the medium at a later time. This is called write-back caching. The command may complete prior to logical blocks being written to the medium. As a result of using a write-back caching there is a period of time when the data may be lost if power to the SCSI target device is lost and a volatile cache is being used or a hardware failure occurs. There is also the possibility of an error occurring during the subsequent write operation. If an error occurred during the write operation, it may be reported as a deferred error on a later command. " If you want some WRITEs to hit the persistent media, you must issue special commands, like "synchronize cache", or a write with "FUA" (force unit acccess) bit set. All this is correctly (or at least, it should be) handled by the kernel's barriers, if the device supports it. In the case where no barriers are used, there is no guarantee on reordering of WRITEs, so log corruption can occur. >From where I understand the code, ext3 allows to activate barriers with an option on mount, so when the device does not support them, it is still possible to disable the option by remounting the device. For XFS, barriers will be automatically disabled when the device doesn't support them. (well, this is also what i've observed, but take those statements with caution) > > The message on lkml which Mathieu referred to suggested that there > were three kinds of devices, but it seems to be that type 2 > (flushable) doesn't exist so far as the fs is concerned since > blkdev_issue_flush() just issues a BIO with only a barrier in it. A > device driver might support the barrier request by either waiting for > all outstanding I/O and issuing a flush command (if required) or by > passing the barrier down to the device, assuming that it supports > such a thing directly. > > Further down the message (the url is http://lkml.org/lkml/2007/5/25/71 > btw) there is a list of dm/md implementation status and it seems that > for a good number of the common targets there is little or no support > for barriers anyway at the moment. > > Now I agree that it would be nice to support barriers in GFS2, but it > won't solve any problems relating to ordering of I/O unless all of the > underlying device supports them too. See also Alasdair's response to > the thread: http://lkml.org/lkml/2007/5/28/81 > > So although I'd like to see barrier support in GFS2, it won't solve > any problems for most people and really its a device/block layer > issue at the moment. > > Steve. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From s.wendy.cheng at gmail.com Tue Apr 8 09:13:52 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 8 Apr 2008 04:13:52 -0500 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <1207622206.5259.66.camel@localhost> References: <1207622206.5259.66.camel@localhost> Message-ID: <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> On Mon, Apr 7, 2008 at 9:36 PM, christopher barry < Christopher.Barry at qlogic.com> wrote: > Hi everyone, > > I have a couple of questions about the tuning the dlm and gfs that > hopefully someone can help me with. There are lots to say about this configuration.. It is not a simple tuning issue. > > my setup: > 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's > not new stuff, but corporate standards dictated the rev of rhat. Putting a load balancer in front of cluster filesystem is tricky to get it right (to say the least). This is particularly true between GFS and LVS, mostly because LVS is a general purpose load balancer that is difficult to tune to work with the existing GFS locking overhead. The cluster is a developer build cluster, where developers login, and > are balanced across nodes and edit and compile code. They can access via > vnc, XDMCP, ssh and telnet, and nodes external to the cluster can mount > the gfs home via nfs, balanced through the director. Their homes are on > the gfs, and accessible on all nodes. Direct login into GFS nodes (via vnc, ssh, telnet, etc) is ok but nfs client access in this setup will have locking issues. It is *not* only a performance issue. It is *also* a function issue - that is, before 2.6.19 Linux kernel, NLM locking (used by NFS client) doesn't get propagated into clustered NFS servers. You'll have file corruption if different NFS clients do file lockings and expect the lockings can be honored across different clustered NFS servers. In general, people needs to think *very* carefully to put a load balancer before a group of linux NFS servers using any before-2.6.19 kernel. It is not going to work if there are multiple clients that invoke either posix locks and/or flocks on files that are expected to get accessed across different linux NFS servers on top *any* cluster filesystem (not only GFS). . > > > I'm noticing huge differences in compile times - or any home file access > really - when doing stuff in the same home directory on the gfs on > different nodes. For instance, the same compile on one node is ~12 > minutes - on another it's 18 minutes or more (not running concurrently). > I'm also seeing weird random pauses in writes, like saving a file in vi, > what would normally take less than a second, may take up to 10 seconds. > > * From reading, I see that the first node to access a directory will be > the lock master for that directory. How long is that node the master? If > the user is no longer 'on' that node, is it still the master? If > continued accesses are remote, will the master state migrate to the node > that is primarily accessing it? Cluster locking is expensive. As the result, GFS caches its glocks and there is an one-to-one correspondence between GFS glock and DLM locks. Even an user is no longer "on" that node, the lock stays on that node unless: 1. some other node requests an exclusive access of this lock (file write); or 2. the node has memory pressure that kicks off linux virtual memory manager to reclaim idle filesystem structures (inode, dentries, etc); or 3. abnormal events such as crash, umount, etc. Check out: , http://open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimming-patch/?searchterm=gfs for details. I've set LVS persistence for ssh and > telnet for 5 minutes, to allow multiple xterms fired up in a script to > land on the same node, but new ones later will land on a different node > - by design really. Do I need to make this persistence way longer to > keep people only on the first node they hit? That kind of horks my load > balancing design if so. How can I see which node is master for which > directories? Is there a table I can read somehow? You did the right thing here (by making the connection persistence). There is a gfs glock dump command that can print out all the lock info (name, owner, etc) but I really don't want to recommend it - since automating this process is not trivial and there is no way to do this by hand, i.e. manually. > > * I've bumped the wake times for gfs_scand and gfs_inoded to 30 secs, I > mount noatime,noquota,nodiratime, and David Teigland recommended I set > dlm_dropcount to '0' today on irc, which I did, and I see an improvement > in speed on the node that appears to be master for say 'find' command > runs on the second and subsequent runs of the command if I restart them > immediately, but on the other nodes the speed is awful - worse than nfs > would be. On the first run of a find, or If I wait >10 seconds to start > another run after the last run completes, the time to run is > unbelievably slower than the same command on a standalone box with ext3. > e.g. <9 secs on the standalone, compared to 46 secs on the cluster - on > a different node it can take over 2 minutes! Yet an immediate re-run on > the cluster, on what I think must be the master is sub-second. How can I > speed up the first access time, and how can I keep the speed up similar > to immediate subsequent runs. I've got a ton of memory - I just do not > know which knobs to turn. The more memory you have, the more gfs locks (and their associated gfs file structures) will be cached in the node. It, in turns, will make both dlm and gfs lock queries take longer. The glock_purge (on RHEL 4.6, not on RHEL 4.5) should be able to help but its effects will be limited if you ping-pong the locks quickly between different GFS nodes. Try to play around with this tunable (start with 20%) to see how it goes (but please reset gfs_scand and gfs_inoded back to their defaults while you are experimenting glock_purge). So assume this is a build-compile cluster, implying large amount of small files come and go, The tricks I can think of: 1. glock_purge ~ 20% 2. glock_inode shorter than default (not longer) 3. persistent LVS session if all possible > > > Am I expecting too much from gfs? Did I oversell it when I literally > fought to use it rather than nfs off the NetApp filer, insisting that > the performance of gfs smoked nfs? Or, more likely, do I just not > understand how to optimize it fully for my application? GFS1 is very good on large sequential IO (such as vedio-on-demand) but works poorly in the environment you try to setup. However, I'm in an awkward position to do further comments I'll stop here. -- Wendy > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Tue Apr 8 10:05:25 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Tue, 8 Apr 2008 11:05:25 +0100 (BST) Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <1207622206.5259.66.camel@localhost> References: <1207622206.5259.66.camel@localhost> Message-ID: > my setup: > 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's > not new stuff, but corporate standards dictated the rev of rhat. [...] > I'm noticing huge differences in compile times - or any home file access > really - when doing stuff in the same home directory on the gfs on > different nodes. For instance, the same compile on one node is ~12 > minutes - on another it's 18 minutes or more (not running concurrently). > I'm also seeing weird random pauses in writes, like saving a file in vi, > what would normally take less than a second, may take up to 10 seconds. > > * From reading, I see that the first node to access a directory will be > the lock master for that directory. How long is that node the master? If > the user is no longer 'on' that node, is it still the master? If > continued accesses are remote, will the master state migrate to the node > that is primarily accessing it? I've set LVS persistence for ssh and > telnet for 5 minutes, to allow multiple xterms fired up in a script to > land on the same node, but new ones later will land on a different node > - by design really. Do I need to make this persistence way longer to > keep people only on the first node they hit? That kind of horks my load > balancing design if so. How can I see which node is master for which > directories? Is there a table I can read somehow? > > * I've bumped the wake times for gfs_scand and gfs_inoded to 30 secs, I > mount noatime,noquota,nodiratime, and David Teigland recommended I set > dlm_dropcount to '0' today on irc, which I did, and I see an improvement > in speed on the node that appears to be master for say 'find' command > runs on the second and subsequent runs of the command if I restart them > immediately, but on the other nodes the speed is awful - worse than nfs > would be. On the first run of a find, or If I wait >10 seconds to start > another run after the last run completes, the time to run is > unbelievably slower than the same command on a standalone box with ext3. > e.g. <9 secs on the standalone, compared to 46 secs on the cluster - on > a different node it can take over 2 minutes! Yet an immediate re-run on > the cluster, on what I think must be the master is sub-second. How can I > speed up the first access time, and how can I keep the speed up similar > to immediate subsequent runs. I've got a ton of memory - I just do not > know which knobs to turn. It sounds like bumping up lock trimming might help, but I don't think the feature accessibility through /sys has been back-ported to RHEL4, so if you're stuck with RHEL4, you may have to rebuild the latest versions of the tools and kernel modules from RHEL5, or you're out of luck. > Am I expecting too much from gfs? Did I oversell it when I literally > fought to use it rather than nfs off the NetApp filer, insisting that > the performance of gfs smoked nfs? Or, more likely, do I just not > understand how to optimize it fully for my application? Probably a combination of all of the above. The main advantage of GFS isn't speed, it's the fact that it is a proper POSIX file system, unlike NFS or CIFS (e.g. file locking actually works on GFS). It also tends to stay consistent if a node fails, due to journalling. Having said that, I've not seen speed differences as big as what you're describing, but I'm using RHEL5. I also have bandwidth charts for my DRBD/cluster interface, and the bandwidth usage on a lightly loaded system is not really signifficant unless lots of writes start happening. With mostly reads (which can all be served from the local DRBD mirror), the background "noise" traffic of combined DRBD and RHCS is > 200Kb/s (25KB/s). Since the ping times are < 0.1ms, in theory, this should make locks take < 1ms to resolve/migrate. Of course, if your find goes over 50,000 files, the a 50 second delay to migrate all the locks may well be in a reasonable ball-park. You may find that things have moved on quite a bit since RHEL4... Gordan From paolom at prisma-eng.it Tue Apr 8 12:51:01 2008 From: paolom at prisma-eng.it (Paolo Marini) Date: Tue, 08 Apr 2008 14:51:01 +0200 Subject: [Linux-cluster] Problems with SAMBA server on Centos 51 virtual xen guest with iSCSI SAN In-Reply-To: <47F3E82F.6030600@redhat.com> References: <47F3C07A.8090709@prisma-eng.it> <47F3E82F.6030600@redhat.com> Message-ID: <47FB6A35.3030407@prisma-eng.it> After some investigation, it seems that the problem is really related to samba and not to the cluster infrastructure which is working quite well. Here some posting on the issue with samba, that was exploited with the upgrade to 3.0.25 included in the RH 5.1 update: http://bugs.contribs.org/show_bug.cgi?id=3762 http://www.centos.org/modules/newbb/viewtopic.php?post_id=39829&topic_id=12152 https://bugzilla.redhat.com/show_bug.cgi?id=426244 What I did to solve the problem was to get the latest samba sources (3.0.28a) and rebuild the package updating the spec file. I commented out the patches from 115 onwards as they are already included in the samba 3.0.28a tarball. After the upgrade, none of the problems mentioned by me and in the above reported links happened again. Hope this helps other folks solve the same problem, and also convinces RH people to upgrade the sasmba package. Paolo John Ruemker ha scritto: > Paolo Marini wrote: >> I have implemented a cluster of a few xen guest with a shared GFS >> filesystem residing on a SAN build with openfiler to support iSCSI >> storage. >> >> Physical servers are 3 machines implementing a physical cluster, each >> one equipped with quad xeon and 4 G RAM. The network interface is >> based on channel bonding with LACP (on the physical hosts) having an >> aggregate of 2 gigabits ethernet per physical host, the switch >> supports LACP and has been configured accordingly. >> >> Virtual servers are based on xen nodes on top of the physical server >> with shared storage on iSCSI and GFS. >> >> The networking is based on a cluster private network (for cluster >> heartbeat and cluster communication + iSCSI) and an ethernet alias >> for the LAN to which the users are connected. >> >> One of the cluster xen nodes is used for implementing a samba PDC (no >> failover of the service, plain samba, single samba server on the LAN) >> plus ldap server; samba works with ldap for users authentication. >> Storage for the samba server is on the SAN. >> >> I continue to receive complaints from my users due to the fact that >> sometimes copying file generates errors, plus problems related to >> office usage (we still use the old Office 97 on some machines). The >> samba configuration is more or less the same as that correctly >> working on the previous physical machine, on which those problems >> were not present. >> >> The problems generate these log entries on /var/log/samba/smbd: >> >> [2008/04/02 19:00:50, 0] lib/util_sock.c:get_peer_addr(1232) >> getpeername failed. Error was Transport endpoint is not connected >> [2008/04/02 19:05:32, 0] lib/util_sock.c:get_peer_addr(1232) >> getpeername failed. Error was Transport endpoint is not connected >> [2008/04/02 19:05:32, 0] lib/util_sock.c:get_peer_addr(1232) >> getpeername failed. Error was Transport endpoint is not connected >> >> And on the client machine log also on /var/log/samba >> >> [2008/04/02 19:04:34, 0] lib/util_sock.c:read_data(534) >> read_data: read failure for 4 bytes to client 192.168.13.240. Error >> = Connection reset by peer >> [2008/04/02 19:04:34, 1] smbd/service.c:close_cnum(1230) >> amhwq53p (192.168.13.240) closed connection to service tmp >> [2008/04/02 19:04:34, 1] smbd/service.c:close_cnum(1230) >> amhwq53p (192.168.13.240) closed connection to service stock >> [2008/04/02 19:04:34, 0] lib/util_sock.c:write_data(562) >> write_data: write failure in writing to client 192.168.13.240. Error >> Broken pipe >> [2008/04/02 19:04:34, 0] lib/util_sock.c:send_smb(769) >> Error writing 75 bytes to client. -1. (Broken pipe) >> [2008/04/02 19:04:34, 1] smbd/service.c:make_connection_snum(1033) >> >> They seem similar to problems related to poor connectivity or problem >> in the network; however, these problems are new and were never found >> before switching to the clustered architecture. Also no problem have >> been found so far on the other xen nodes serving the same GFS >> filesystem (different dirs !) for NFS or other services. >> >> Also putting the option >> >> posix locking = no >> >> on the smb.conf file did not help. >> >> Any idea from someone else facing the same problems ? >> >> thanks, Paolo >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > Those errors are explained in > > http://kbase.redhat.com/faq/FAQ_45_5274.shtm > > John > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From s.wendy.cheng at gmail.com Tue Apr 8 14:37:58 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 08 Apr 2008 09:37:58 -0500 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: References: <1207622206.5259.66.camel@localhost> Message-ID: <47FB8346.1070802@gmail.com> gordan at bobich.net wrote: > > >> my setup: >> 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's >> not new stuff, but corporate standards dictated the rev of rhat. > [...] >> I'm noticing huge differences in compile times - or any home file access >> really - when doing stuff in the same home directory on the gfs on >> different nodes. For instance, the same compile on one node is ~12 >> minutes - on another it's 18 minutes or more (not running concurrently). >> I'm also seeing weird random pauses in writes, like saving a file in vi, >> what would normally take less than a second, may take up to 10 seconds. >> >> * From reading, I see that the first node to access a directory will be >> the lock master for that directory. How long is that node the master? If >> the user is no longer 'on' that node, is it still the master? If >> continued accesses are remote, will the master state migrate to the node >> that is primarily accessing it? I've set LVS persistence for ssh and >> telnet for 5 minutes, to allow multiple xterms fired up in a script to >> land on the same node, but new ones later will land on a different node >> - by design really. Do I need to make this persistence way longer to >> keep people only on the first node they hit? That kind of horks my load >> balancing design if so. How can I see which node is master for which >> directories? Is there a table I can read somehow? >> >> * I've bumped the wake times for gfs_scand and gfs_inoded to 30 secs, I >> mount noatime,noquota,nodiratime, and David Teigland recommended I set >> dlm_dropcount to '0' today on irc, which I did, and I see an improvement >> in speed on the node that appears to be master for say 'find' command >> runs on the second and subsequent runs of the command if I restart them >> immediately, but on the other nodes the speed is awful - worse than nfs >> would be. On the first run of a find, or If I wait >10 seconds to start >> another run after the last run completes, the time to run is >> unbelievably slower than the same command on a standalone box with ext3. >> e.g. <9 secs on the standalone, compared to 46 secs on the cluster - on >> a different node it can take over 2 minutes! Yet an immediate re-run on >> the cluster, on what I think must be the master is sub-second. How can I >> speed up the first access time, and how can I keep the speed up similar >> to immediate subsequent runs. I've got a ton of memory - I just do not >> know which knobs to turn. > > It sounds like bumping up lock trimming might help, but I don't think > the feature accessibility through /sys has been back-ported to RHEL4, > so if you're stuck with RHEL4, you may have to rebuild the latest > versions of the tools and kernel modules from RHEL5, or you're out of > luck. Glock trimming patch was mostly written and tuned on top of RHEL 4. It doesn't use /sys interface. The original patch was field tested on several customer production sites. Upon CVS RHEL 4.5 check-in, it was revised to use a less aggressive approach and turned out to be not as effective as the original approach. So the original patch was re-checked into RHEL 4.6. I wrote the patch. -- Wendy From garromo at us.ibm.com Tue Apr 8 17:28:42 2008 From: garromo at us.ibm.com (Gary Romo) Date: Tue, 8 Apr 2008 11:28:42 -0600 Subject: [Linux-cluster] Tunable parameters In-Reply-To: <1207603538.2927.37.camel@localhost.localdomain> Message-ID: Anyone know where I can get details about these files? # pwd /proc/cluster/config/cman # ls deadnode_timeout join_timeout max_retries transition_restarts hello_timer joinwait_timeout newcluster_timeout transition_timeout joinconf_timeout max_nodes sm_debug_size I am looking for definitions and for the ability to modify (if necessary). Thank you. Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From garromo at us.ibm.com Tue Apr 8 17:31:40 2008 From: garromo at us.ibm.com (Gary Romo) Date: Tue, 8 Apr 2008 11:31:40 -0600 Subject: [Linux-cluster] timers tuning (contd) In-Reply-To: <1207603409.2927.34.camel@localhost.localdomain> Message-ID: How do you increase the hearbeat timeout? Gary Romo Lon Hohberger Sent by: linux-cluster-bounces at redhat.com 04/07/2008 03:23 PM Please respond to linux clustering To linux clustering cc Subject Re: [Linux-cluster] timers tuning (contd) On Mon, 2008-04-07 at 15:00 +0200, Alain Moulle wrote: > Hi > > Is there a similar rule with CS5 ? I mean if we > increase the heart-beat timeout, is there some > other parameters to adjust together ? qdisk timeout should be about a hair more than cman's timeout, if you're using it. -- Lon -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcse47 at hotmail.com Tue Apr 8 19:33:39 2008 From: mcse47 at hotmail.com (Tracey Flanders) Date: Tue, 8 Apr 2008 15:33:39 -0400 Subject: [Linux-cluster] Can you shrink your GFS Volume? In-Reply-To: <20080408160009.676B26192A4@hormel.redhat.com> References: <20080408160009.676B26192A4@hormel.redhat.com> Message-ID: I need to add another journal to my GFS volume to add another cluster node. But I noticed that I get an error about free blocks. Here's the command I'm running and the message I receive: gfs_jadd -v -j1 /mnt/gfs1 Requested size (32768 blocks) greater than available space (3 blocks) This makes perfect sense since I don't have any free space outside the gfs formatted volume. Is it possible at all to shrink a GFS volume? Or do I need to add more space to my lvm volume? Simply, can GFS filesystems be shrunk? _________________________________________________________________ Use video conversation to talk face-to-face with Windows Live Messenger. http://www.windowslive.com/messenger/connect_your_way.html?ocid=TXT_TAGLM_WL_Refresh_messenger_video_042008 From tiagocruz at forumgdh.net Tue Apr 8 19:40:03 2008 From: tiagocruz at forumgdh.net (Tiago Cruz) Date: Tue, 08 Apr 2008 16:40:03 -0300 Subject: [Linux-cluster] Can you shrink your GFS Volume? In-Reply-To: References: <20080408160009.676B26192A4@hormel.redhat.com> Message-ID: <1207683603.15852.25.camel@tuxkiller.ig.com.br> On Tue, 2008-04-08 at 15:33 -0400, Tracey Flanders wrote: > Simply, can GFS filesystems be shrunk? ?Did you saw this? https://www.redhat.com/archives/linux-cluster/2008-April/msg00076.html From kadlec at sunserv.kfki.hu Tue Apr 8 21:09:26 2008 From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef) Date: Tue, 8 Apr 2008 23:09:26 +0200 (CEST) Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> References: <1207622206.5259.66.camel@localhost> <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> Message-ID: On Tue, 8 Apr 2008, Wendy Cheng wrote: > The more memory you have, the more gfs locks (and their associated gfs file > structures) will be cached in the node. It, in turns, will make both dlm and > gfs lock queries take longer. The glock_purge (on RHEL 4.6, not on RHEL 4.5) > should be able to help but its effects will be limited if you ping-pong the > locks quickly between different GFS nodes. Try to play around with this > tunable (start with 20%) to see how it goes (but please reset gfs_scand and > gfs_inoded back to their defaults while you are experimenting glock_purge). > > So assume this is a build-compile cluster, implying large amount of small > files come and go, The tricks I can think of: > > 1. glock_purge ~ 20% > 2. glock_inode shorter than default (not longer) > 3. persistent LVS session if all possible What is glock_inode? Does it exist or something equivalent in cluster-2.01.00? Isn't GFS_GL_HASH_SIZE too small for large amount of glocks? Being too small it results not only long linked lists but clashing at the same bucket will block otherwise parallel operations. Wouldn't it help increasing it from 8k to 65k? Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From robertofratelli at yahoo.com Tue Apr 8 23:41:04 2008 From: robertofratelli at yahoo.com (Roberto Fratelli) Date: Tue, 8 Apr 2008 16:41:04 -0700 (PDT) Subject: [Linux-cluster] Node fencing without an apparent reason Message-ID: <347183.45719.qm@web33302.mail.mud.yahoo.com> Hello Everyone. I've been reading about post_fail_delay option and i would like to hear your thoughts. I have a 2 node cluster using GFS mounts. I want to prevent a "not so dead" node being fenced by the other node by increasing post_fail_delay value. Nowdays, i have it set to 0 I'm using DRAC as a fencing device, but ofter i saw one node fencing the other one without an apparent reason (no network / quorum disk failures) and i'm not happy with that... I've read about the risks of having the active node replaying other's node GFS Journal and then having the 2nd node write on GFS again i can get GFS Metadata corruption, but how long (seconds) this whole procedure occurs ? Is it safe to increase post_fail_delay to something like 5 seconds ? Thanks ! Roberto Fratelli ____________________________________________________________________________________ You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. http://tc.deals.yahoo.com/tc/blockbuster/text5.com From Alain.Moulle at bull.net Wed Apr 9 06:57:21 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Wed, 09 Apr 2008 08:57:21 +0200 Subject: [Linux-cluster] CS5 / timers tuning (contd) Message-ID: <47FC68D1.3000004@bull.net> Hi Lon, and thanks for your answer, but : about "qdisk timeout" , you mean "tko" parameter in cluster.conf ? Because I can't see a qdisk "timeout" parameter in all your qdisk parameter description ... And if it is "tko", what is the value in seconds of one cyle, so that we can adjust it a hair more than cman's timeout ? Thanks for these details. Regards Alain Moull? On Mon, 2008-04-07 at 15:00 +0200, Alain Moulle wrote: >> Hi >> >> Is there a similar rule with CS5 ? I mean if we >> increase the heart-beat timeout, is there some >> other parameters to adjust together ? qdisk timeout should be about a hair more than cman's timeout, if you're using it. -- Lon From j.buzzard at dundee.ac.uk Wed Apr 9 09:06:12 2008 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Wed, 09 Apr 2008 10:06:12 +0100 Subject: [Linux-cluster] Writing new fencing agent Message-ID: <1207731972.9236.51.camel@localhost.lifesci.dundee.ac.uk> I am re-purposing an old cluster that used to run RHEL4 and IBM's GPFS. The nodes are all HP NetServer LP1000r with 2GB RAM, and dual 1.4GHz PIII's with an additional 1Gbps Intel NIC, and a local 73GB 10k RPM SCSI disk. I have 48 of these nodes (and a couple spare). As the GPFS and RedHat licenses have been transferred to new machines, it is my intention to rebuild the nodes using CentOS 5 and use GFS. I have a couple TB of iSCSI storage to go with it. This is a low budget project and I need a fencing device. The nodes all support something called "Alert on Lan v2", which seems to have been a fore runner of IPMI. I have a separate "management" network, and have turned AOL on in the BIOS on each node. Googling turned up no documentation on how Alert on Lan works so some time later with Wireshark and the windows client I have some C code that sends magic packets of death to either power off, reset, or power cycle (off wait 15 seconds then on) the nodes. Testing shows that it is robust in that it works on a node that has kernel panicked and is otherwise totally hung. It is also fast, once magic packet of death received the node is off instantly. All that seems to be required on the client side is for the management NIC to be up and configured with an IP address. This is contrary to the suggestion that client software is need according to the rather sketchy HP documentation. All good so far. However I am not sure what the requirements of a fencing agent are. Can I rename my program fence_aol2 fiddle with cluster.conf and it will work? Does the fencing agent have to return specific exit codes? Should the fencing agent do something to test the magic packet of death worked or is simply sending it enough? Does the fencing agent need to be able to turn nodes on (I could use Wake On Lan for this) as well as off? Finally once I have a working and debugged AOL2 fencing agent, how does one go about submitting for inclusion in cluster suite. Alternatively if this is not wanted (Alert on Lan is a historical protocol and superseded by IPMI) what is the best way of pointing other users to it's existance? JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH From pmshehzad at yahoo.com Wed Apr 9 09:57:58 2008 From: pmshehzad at yahoo.com (Mshehzad Pankhawala) Date: Wed, 9 Apr 2008 02:57:58 -0700 (PDT) Subject: [Linux-cluster] How to configure Squid Server Failover in RHCS Message-ID: <571371.33505.qm@web45802.mail.sp1.yahoo.com> Thanks to everyone , I am testing redhat cluster suite, I have successfully configured Apache Failover service using RHCS (in RHEL5). Now i am testing a Squid server Failover. One IP address (ex. 192.168.0.111) is allowed to connect to the internet directly. I have two RHEL5 Server having Squid installed on it. and i want to configure Squid Fail over using RHCS on those two servers. But I can't find in Resource List Resource Like Squid Server (As like Apache Server). After that I wrote a script to start the Squid Server And added it as Script resource but it failed. The proble is that I am getting requests from clients at 3128 port of Squid but they won't connect to internet at all. I have desabled SELinux and Firewall from the begning of this testing. Please Reply my with some alternatives Thanks in Advance, Regards. MShehzad __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgrac at redhat.com Wed Apr 9 14:32:21 2008 From: mgrac at redhat.com (Marek 'marx' Grac) Date: Wed, 09 Apr 2008 16:32:21 +0200 Subject: [Linux-cluster] Writing new fencing agent In-Reply-To: <1207731972.9236.51.camel@localhost.lifesci.dundee.ac.uk> References: <1207731972.9236.51.camel@localhost.lifesci.dundee.ac.uk> Message-ID: <47FCD375.5020807@redhat.com> Hi, Jonathan Buzzard wrote: > All good so far. However I am not sure what the requirements of a > fencing agent are. > Can I rename my program fence_aol2 fiddle with cluster.conf and it will work? You have to set 'agent' option: These options will be set to fencing agent on STDIN. Also there is a set of getopt arguments (look at existing code). > Does the fencing agent have to return specific exit codes? You should return 0 when operation was finished succesfully. > Should the fencing agent do something to test the magic packet of death worked or is simply sending it enough? All 'standard' fencing agents when rebooting are doing these actions: 1) power off 2) test if the plug/machine is powered off [sometimes it take few seconds] 3) power on > Does the > fencing agent need to be able to turn nodes on (I could use Wake On Lan > for this) as well as off? > > yes, it should. > Finally once I have a working and debugged AOL2 fencing agent, how does > one go about submitting for inclusion in cluster suite. Alternatively if > this is not wanted (Alert on Lan is a historical protocol and superseded > by IPMI) what is the best way of pointing other users to it's existance? > > You can take a look at new fencing agents (available in git / master branch). They use a python module for common fencing tasks and it should not be a problem to write a new fencing agent (agent for APC devices has 3kB). If you will find any problem with new agents don't hesitate and contact me. Marek Grac -- Marek Grac Red Hat Czech s.r.o. From lhh at redhat.com Wed Apr 9 14:58:48 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 09 Apr 2008 10:58:48 -0400 Subject: [Linux-cluster] CS5 / timers tuning (contd) In-Reply-To: <47FC68D1.3000004@bull.net> References: <47FC68D1.3000004@bull.net> Message-ID: <1207753128.15132.89.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-04-09 at 08:57 +0200, Alain Moulle wrote: > Hi Lon, and thanks for your answer, but : > > about "qdisk timeout" , you mean "tko" parameter in cluster.conf ? > Because I can't see a qdisk "timeout" parameter in all your qdisk > parameter description ... Right, interval + tko interval * tko = qdisk timeout See the qdisk man page for more details. -- Lon From lhh at redhat.com Wed Apr 9 15:06:36 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 09 Apr 2008 11:06:36 -0400 Subject: [Linux-cluster] timers tuning (contd) In-Reply-To: References: Message-ID: <1207753596.15132.93.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-04-08 at 11:31 -0600, Gary Romo wrote: > > How do you increase the hearbeat timeout? On cluster2 (rhel5-ish): ... where x is the number of _milliseconds_. Default is 5000 (5 seconds). On cluster1 (rhel4-ish), I don't recall if there's a way to do it from cluster.conf; but in your cman initscript: echo x > /proc/cluster/config/cman/deadnode_timeout ... where x is the number of _seconds_. Default is 21. -- Lon From s.wendy.cheng at gmail.com Wed Apr 9 15:06:08 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 09 Apr 2008 10:06:08 -0500 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: References: <1207622206.5259.66.camel@localhost> <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> Message-ID: <47FCDB60.8030807@gmail.com> Kadlecsik Jozsef wrote: > > What is glock_inode? Does it exist or something equivalent in > cluster-2.01.00? > Sorry, typo. What I mean is "inoded_secs" (gfs inode daemon wake-up time). This is the daemon that reclaims deleted inodes. Don't set it too small though. > > Isn't GFS_GL_HASH_SIZE too small for large amount of glocks? Being too > small it results not only long linked lists but clashing at the same > bucket will block otherwise parallel operations. Wouldn't it help > increasing it from 8k to 65k? > Worth a try. However, the issues involved here are more than lock searching time. It also has to do with cache flushing. GFS currently accumulates too much dirty caches. When it starts to flush, it will pause the system for too long. Glock trimming helps - since cache flush is part of glock releasing operation. -- Wendy From isplist at logicore.net Wed Apr 9 17:18:00 2008 From: isplist at logicore.net (isplist at logicore.net) Date: Wed, 9 Apr 2008 12:18:00 -0500 Subject: [Linux-cluster] SSC Console? Message-ID: <20084912180.894658@leena> Anyone know what an SSC console is? I'm trying to migrate some data between storage devices but the device requires that I enter the command from an SSC Console. EXAMPLE; mylinuxsystem:~/tmp$ssc 192.168.43.70 Just like ssh, telnet, etc. Can't find this anywhere. Since everyone here is into storage, thought I'd ask. Mike From s.wendy.cheng at gmail.com Wed Apr 9 17:54:27 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 09 Apr 2008 12:54:27 -0500 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <47FCDB60.8030807@gmail.com> References: <1207622206.5259.66.camel@localhost> <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> <47FCDB60.8030807@gmail.com> Message-ID: <47FD02D3.7010802@gmail.com> Wendy Cheng wrote: > Kadlecsik Jozsef wrote: >> >> What is glock_inode? Does it exist or something equivalent in >> cluster-2.01.00? >> > > Sorry, typo. What I mean is "inoded_secs" (gfs inode daemon wake-up > time). This is the daemon that reclaims deleted inodes. Don't set it > too small though. Have been responding to this email from top of the head, based on folks' descriptions. Please be aware that they are just rough thoughts and the responses may not fit in general cases. The above is mostly for the original problem description where: 1. The system is designated for build-compile - my take is that there are many temporary and deleted files. 2. The gfs_inode tunable was changed (to 30, instead of default, 15). > >> >> Isn't GFS_GL_HASH_SIZE too small for large amount of glocks? Being >> too small it results not only long linked lists but clashing at the >> same bucket will block otherwise parallel operations. Wouldn't it >> help increasing it from 8k to 65k? >> > > Worth a try. Now I remember .... we did experiment with different hash sizes when this latency issue was first reported two years ago. It didn't make much difference. The cache flushing, on the other hand, was more significant. -- Wendy > > However, the issues involved here are more than lock searching time. > It also has to do with cache flushing. GFS currently accumulates too > much dirty caches. When it starts to flush, it will pause the system > for too long. Glock trimming helps - since cache flush is part of > glock releasing operation. > > > > From kadlec at sunserv.kfki.hu Wed Apr 9 19:42:33 2008 From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef) Date: Wed, 9 Apr 2008 21:42:33 +0200 (CEST) Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <47FD02D3.7010802@gmail.com> References: <1207622206.5259.66.camel@localhost> <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> <47FCDB60.8030807@gmail.com> <47FD02D3.7010802@gmail.com> Message-ID: On Wed, 9 Apr 2008, Wendy Cheng wrote: > Have been responding to this email from top of the head, based on folks' > descriptions. Please be aware that they are just rough thoughts and the > responses may not fit in general cases. The above is mostly for the original > problem description where: > > 1. The system is designated for build-compile - my take is that there are many > temporary and deleted files. > 2. The gfs_inode tunable was changed (to 30, instead of default, 15). I'll take it into account when experimenting with the different settings. > > > Isn't GFS_GL_HASH_SIZE too small for large amount of glocks? Being too > > > small it results not only long linked lists but clashing at the same > > > bucket will block otherwise parallel operations. Wouldn't it help > > > increasing it from 8k to 65k? > > > > Worth a try. > > Now I remember .... we did experiment with different hash sizes when this > latency issue was first reported two years ago. It didn't make much > difference. The cache flushing, on the other hand, was more significant. What led me to suspect clashing in the hash (or some other lock-creating issue) was the simple test I made on our five node cluster: on one node I ran find /gfs -type f -exec cat {} > /dev/null \; and on another one just started an editor, naming a non-existent file. It took multiple seconds while the editor "opened" the file. What else than creating the lock could delay the process so long? > > However, the issues involved here are more than lock searching time. It also > > has to do with cache flushing. GFS currently accumulates too much dirty > > caches. When it starts to flush, it will pause the system for too long. > > Glock trimming helps - since cache flush is part of glock releasing > > operation. But 'flushing when releasing glock' looks as a side effect. I mean, isn't there a more direct way to control the flushing? I can easily be totally wrong, but on the one hand, it's good to keep as many locks cached as possible, because lock creation is expensive. But on the other hand, trimming locks triggers flushing, which helps to keep the systems running more smoothly. So a tunable to control flushing directly would be better than just trimming the locks, isn't it. But not knowing the deep internals of GFS, my reasoning can of course be bogus. Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From gordan at bobich.net Wed Apr 9 20:08:39 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 9 Apr 2008 21:08:39 +0100 (BST) Subject: [Linux-cluster] How to configure Squid Server Failover in RHCS In-Reply-To: <571371.33505.qm@web45802.mail.sp1.yahoo.com> References: <571371.33505.qm@web45802.mail.sp1.yahoo.com> Message-ID: On Wed, 9 Apr 2008, Mshehzad Pankhawala wrote: > Now i am testing a Squid server Failover. > > One IP address (ex. 192.168.0.111) is allowed to connect to the internet directly. > > I have two RHEL5 Server having Squid installed on it. and i want to configure Squid Fail over > using RHCS on those two servers. But I can't find in Resource List Resource Like Squid Server > (As like Apache Server). > > After that I wrote a script to start the Squid Server And added it as Script resource but it > failed. The proble is that I am getting requests from clients at 3128 port of Squid but they > won't connect to internet at all. > > I have desabled SELinux and Firewall from the begning of this testing. Squid can be rather funny with what IP addresses/interfaces it binds to. I've found it works in a hot-failover setup (always runs on both servers) if you tell it to bind to a port without specifying the IPs (so it binds to all interfaces/IPs), and just fail over the IP - no need to bother specifying the fail-over service. Mind you, same holds true for Apache - you might as well have your servers as a load-balanced pair, rather than warm spare fail-over. Gordan From s.wendy.cheng at gmail.com Wed Apr 9 20:41:37 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 09 Apr 2008 15:41:37 -0500 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: References: <1207622206.5259.66.camel@localhost> <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> <47FCDB60.8030807@gmail.com> <47FD02D3.7010802@gmail.com> Message-ID: <47FD2A01.1030708@gmail.com> Kadlecsik Jozsef wrote: > On Wed, 9 Apr 2008, Wendy Cheng wrote: > > >> Have been responding to this email from top of the head, based on folks' >> descriptions. Please be aware that they are just rough thoughts and the >> responses may not fit in general cases. The above is mostly for the original >> problem description where: >> >> 1. The system is designated for build-compile - my take is that there are many >> temporary and deleted files. >> 2. The gfs_inode tunable was changed (to 30, instead of default, 15). >> > > I'll take it into account when experimenting with the different settings. > > >>>> Isn't GFS_GL_HASH_SIZE too small for large amount of glocks? Being too >>>> small it results not only long linked lists but clashing at the same >>>> bucket will block otherwise parallel operations. Wouldn't it help >>>> increasing it from 8k to 65k? >>>> >>> Worth a try. >>> >> Now I remember .... we did experiment with different hash sizes when this >> latency issue was first reported two years ago. It didn't make much >> difference. The cache flushing, on the other hand, was more significant. >> > > What led me to suspect clashing in the hash (or some other lock-creating > issue) was the simple test I made on our five node cluster: on one node I > ran > > find /gfs -type f -exec cat {} > /dev/null \; > > and on another one just started an editor, naming a non-existent file. > It took multiple seconds while the editor "opened" the file. What else > than creating the lock could delay the process so long? > Not knowing how "find" is implemented, I would guess this is caused by directory locks. Creating a file needs a directory lock. Your exclusive write lock (file create) can't be granted until the "find" releases the directory lock. It doesn't look like a lock query performance issue to me. > >>> However, the issues involved here are more than lock searching time. It also >>> has to do with cache flushing. GFS currently accumulates too much dirty >>> caches. When it starts to flush, it will pause the system for too long. >>> Glock trimming helps - since cache flush is part of glock releasing >>> operation. >>> > > But 'flushing when releasing glock' looks as a side effect. I mean, isn't > there a more direct way to control the flushing? > > I can easily be totally wrong, but on the one hand, it's good to keep as > many locks cached as possible, because lock creation is expensive. But on > the other hand, trimming locks triggers flushing, which helps to keep the > systems running more smoothly. So a tunable to control flushing directly > would be better than just trimming the locks, isn't it. To make long story short, I did submit a direct cache flush patch first, instead of this final version of lock trimming patch. Unfortunately, it was *rejected*. -- Wendy From federico.simoncelli at gmail.com Thu Apr 10 08:57:44 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Thu, 10 Apr 2008 10:57:44 +0200 Subject: [Linux-cluster] Migration of VMs instead of relocation Message-ID: Hi everybody. Shutting down a cluster node results in relocate the services to another node (accomplished with stop and start). Is there any way to change this behavior to "migrate" for the virtual machines? It looks like this post should be related to my problem: http://article.gmane.org/gmane.linux.redhat.cluster/10848 The rgmanager version I'm using is 2.0.31-1. Thanks. -- Federico. From npf-mlists at eurotux.com Thu Apr 10 09:35:34 2008 From: npf-mlists at eurotux.com (Nuno Fernandes) Date: Thu, 10 Apr 2008 10:35:34 +0100 Subject: [Linux-cluster] RHEL cluster upgrade from 5.0 to 5.1 Message-ID: <200804101035.35425.npf-mlists@eurotux.com> Hi, With a cluster of only clvmd is it possible to do a rolling upgrade from 5.0 to 5.1? By rolling upgrade i mean, 1 - select 1 node 2 - leave the node from cluster 3 - upgrade to 5.1 4 - join to cluster 5 - go to step 1 Any info? Thanks, Nuno Fernandes From Alain.Moulle at bull.net Thu Apr 10 12:51:59 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 10 Apr 2008 14:51:59 +0200 Subject: [Linux-cluster] CS5 / timers tuning (contd) Message-ID: <47FE0D6F.9060700@bull.net> Hi Lon Thans again, but that's strange because in the man , the recommended values are : intervall="1" tko="10" and so we have a result < 21s which is the default value of heart-beat timer, so not a hair above like you recommened in previous email ... extract of man qddisk : interval="1" This is the frequency of read/write cycles, in seconds. tko="10" This is the number of cycles a node must miss in order to be declared dead. ? PS : " don't recall if there's a way to do it from cluster.conf" yes we can change the deadnode_timeout in cluster.conf : Thanks Regards Alain Moull? >Hi Lon, and thanks for your answer, but : >> >> about "qdisk timeout" , you mean "tko" parameter in cluster.conf ? >> Because I can't see a qdisk "timeout" parameter in all your qdisk >> parameter description ... Right, interval + tko interval * tko = qdisk timeout See the qdisk man page for more details. -- Lon From kadlec at sunserv.kfki.hu Thu Apr 10 13:00:40 2008 From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef) Date: Thu, 10 Apr 2008 15:00:40 +0200 (CEST) Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <47FD2A01.1030708@gmail.com> References: <1207622206.5259.66.camel@localhost> <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> <47FCDB60.8030807@gmail.com> <47FD02D3.7010802@gmail.com> <47FD2A01.1030708@gmail.com> Message-ID: On Wed, 9 Apr 2008, Wendy Cheng wrote: > > What led me to suspect clashing in the hash (or some other lock-creating > > issue) was the simple test I made on our five node cluster: on one node I > > ran > > > > find /gfs -type f -exec cat {} > /dev/null \; > > > > and on another one just started an editor, naming a non-existent file. > > It took multiple seconds while the editor "opened" the file. What else than > > creating the lock could delay the process so long? > > > > Not knowing how "find" is implemented, I would guess this is caused by > directory locks. Creating a file needs a directory lock. Your exclusive write > lock (file create) can't be granted until the "find" releases the directory > lock. It doesn't look like a lock query performance issue to me. As /gfs is a large directory structure with hundreds of user home directories, somehow I don't think I could pick the same directory which was just processed by "find". But this is a good clue to what might bite us most! Our GFS cluster is an almost mail-only cluster for users with Maildir. When the users experience temporary hangups for several seconds (even when writing a new mail), it might be due to the concurrent scanning for a new mail on one node by the MUA and the delivery to the Maildir in another node by the MTA. What is really strange (and distrurbing) that such "hangups" can take 10-20 seconds which is just too much for the users. In order to look at the possible tuning options and the side effects, I list what I have learned so far: - Increasing glock_purge (percent, default 0) helps to trim back the unused glocks by gfs_scand itself. Otherwise glocks can accumulate and gfs_scand eats more and more time at scanning the larger and larger table of glocks. - gfs_scand wakes up every scand_secs (default 5s) to scan the glocks, looking for work to do. By increasing scand_secs one can lessen the load produced by gfs_scand, but it'll hurt because flushing data can be delayed. - Decreasing demote_secs (seconds, default 300) helps to flush cached data more often by moving write locks into less restricted states. Flushing often helps to avoid burstiness *and* to prolong another nodes' lock access. Question is, what are the side effects of small demote_secs values? (Probably there is no much point to choose smaller demote_secs value than scand_secs.) Currently we are running with 'glock_purge = 20' and 'demote_secs = 30'. > > But 'flushing when releasing glock' looks as a side effect. I mean, isn't > > there a more direct way to control the flushing? > > To make long story short, I did submit a direct cache flush patch first, > instead of this final version of lock trimming patch. Unfortunately, it was > *rejected*. I see. Another question, just out of curiosity: why don't you use kernel timers for every glock instead of gfs_scand? The hash bucket id of the glock should be added to struct gfs_glock, but the timer function could be almost identical with scan_glock. As far as I see the only drawback were that it'd be equivalent with 'glock_purge = 100' and it'd be tricky to emulate glock_purge != 100 settings. Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From Alain.Moulle at bull.net Thu Apr 10 14:01:45 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 10 Apr 2008 16:01:45 +0200 Subject: [Linux-cluster] CS5/ Little thing with clustat and the quorum disk Message-ID: <47FE1DC9.8050103@bull.net> Hi Something strange with clustat : if CS5 is launched with a valid Quorum Disk (that we can see with mkqdisk -L) and if we break the quorum disk (i.e mkfs on the device just to simulate a pb to reach the quorum disk), the clustat command always displays the Quorum Disk "Online" : #clustat Member Status: Quorate Member Name ID Status ------ ---- ---- ------ xena140 1 Online, Local, rgmanager xena141 2 Offline /dev/sdb 0 Online, Quorum Disk Note that in this case, the cluster is not at all disturbed, there are ony some messages in syslog like : qdiskd[30709]: Error reading node ID block ... but it just need to execute again a mkqdisk and no needs to stop/start again the CS, all is then working fine. Alain Moull? From jruemker at redhat.com Thu Apr 10 17:00:15 2008 From: jruemker at redhat.com (John Ruemker) Date: Thu, 10 Apr 2008 13:00:15 -0400 Subject: [Linux-cluster] Migration of VMs instead of relocation In-Reply-To: References: Message-ID: <47FE479F.9010900@redhat.com> Federico Simoncelli wrote: > Hi everybody. Shutting down a cluster node results in relocate the > services to another node (accomplished with stop and start). Is there > any way to change this behavior to "migrate" for the virtual machines? > It looks like this post should be related to my problem: > http://article.gmane.org/gmane.linux.redhat.cluster/10848 > The rgmanager version I'm using is 2.0.31-1. > Thanks. > > I believe migration will be the default in RHEL5.2, but for now you can follow the instructions at http://kbase.redhat.com/faq/FAQ_51_11879 John From federico.simoncelli at gmail.com Thu Apr 10 17:52:58 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Thu, 10 Apr 2008 19:52:58 +0200 Subject: [Linux-cluster] Migration of VMs instead of relocation In-Reply-To: <47FE479F.9010900@redhat.com> References: <47FE479F.9010900@redhat.com> Message-ID: On Thu, Apr 10, 2008 at 7:00 PM, John Ruemker wrote: > I believe migration will be the default in RHEL5.2, but for now you can > follow the instructions at http://kbase.redhat.com/faq/FAQ_51_11879 > > John I already fixed that. (It is just to improve performances right? live migration vs. migration) My problem is quite different. Imagine you have to shutdown a node for maintenance... you have to manually migrate all the vms to other nodes before actually shut it down. If you don't, rgmanager takes care of relocating the vms using "relocate" which will result in stopping the service (vm unclean destroy) and starting it somewhere else. I am trying to avoid this kind of behavior. -- Federico. From lpleiman at redhat.com Thu Apr 10 17:56:53 2008 From: lpleiman at redhat.com (Leo Pleiman) Date: Thu, 10 Apr 2008 13:56:53 -0400 Subject: [Linux-cluster] Migration of VMs instead of relocation In-Reply-To: References: <47FE479F.9010900@redhat.com> Message-ID: <47FE54E5.8000005@redhat.com> Isn't this fixed by adding --live to the migration line in the vm.sh script? Federico Simoncelli wrote: > On Thu, Apr 10, 2008 at 7:00 PM, John Ruemker wrote: > >> I believe migration will be the default in RHEL5.2, but for now you can >> follow the instructions at http://kbase.redhat.com/faq/FAQ_51_11879 >> >> John >> > > I already fixed that. (It is just to improve performances right? live > migration vs. migration) > My problem is quite different. > Imagine you have to shutdown a node for maintenance... you have to > manually migrate all the vms to other nodes before actually shut it > down. > If you don't, rgmanager takes care of relocating the vms using > "relocate" which will result in stopping the service (vm unclean > destroy) and starting it somewhere else. > I am trying to avoid this kind of behavior. > > -- Leo J Pleiman Senior Consultant, GPS Federal 410-688-3873 -------------- next part -------------- A non-text attachment was scrubbed... Name: lpleiman.vcf Type: text/x-vcard Size: 194 bytes Desc: not available URL: From jruemker at redhat.com Thu Apr 10 18:45:07 2008 From: jruemker at redhat.com (John Ruemker) Date: Thu, 10 Apr 2008 14:45:07 -0400 Subject: [Linux-cluster] Migration of VMs instead of relocation In-Reply-To: <47FE54E5.8000005@redhat.com> References: <47FE479F.9010900@redhat.com> <47FE54E5.8000005@redhat.com> Message-ID: <47FE6033.8030903@redhat.com> Yes, which is the fix included in that kbase I posted. Leo Pleiman wrote: > Isn't this fixed by adding --live to the migration line in the vm.sh > script? > > Federico Simoncelli wrote: >> On Thu, Apr 10, 2008 at 7:00 PM, John Ruemker >> wrote: >> >>> I believe migration will be the default in RHEL5.2, but for now you >>> can >>> follow the instructions at http://kbase.redhat.com/faq/FAQ_51_11879 >>> >>> John >>> >> >> I already fixed that. (It is just to improve performances right? live >> migration vs. migration) >> My problem is quite different. >> Imagine you have to shutdown a node for maintenance... you have to >> manually migrate all the vms to other nodes before actually shut it >> down. >> If you don't, rgmanager takes care of relocating the vms using >> "relocate" which will result in stopping the service (vm unclean >> destroy) and starting it somewhere else. >> I am trying to avoid this kind of behavior. >> >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From federico.simoncelli at gmail.com Thu Apr 10 19:12:32 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Thu, 10 Apr 2008 21:12:32 +0200 Subject: [Linux-cluster] Migration of VMs instead of relocation In-Reply-To: <47FE6033.8030903@redhat.com> References: <47FE479F.9010900@redhat.com> <47FE54E5.8000005@redhat.com> <47FE6033.8030903@redhat.com> Message-ID: On Thu, Apr 10, 2008 at 8:45 PM, John Ruemker wrote: > Yes, which is the fix included in that kbase I posted. > Leo Pleiman wrote: > > > > Isn't this fixed by adding --live to the migration line in the vm.sh > script? > > I agree with you that adding --live (or -l) would enable live migration but only when you issue a migrate request with the command: # clusvcadm -M vm:clustered_guest -n cluster2-2 (taken from the kbase posted) It works. Agreed. The problem is that when you shutdown a node the rgmanager is stopped (migration is not involved for as much as I know). # service rgmanager stop This would stop rgmanager (as much as a node shoutdown would do) and the services are stopped and started on available nodes. I would like to automatically migrate the vms instead. Can you try this and confirm that the fix in the kbase is not related to this behavior? Thank you. -- Federico. From isplist at logicore.net Thu Apr 10 19:52:31 2008 From: isplist at logicore.net (isplist at logicore.net) Date: Thu, 10 Apr 2008 14:52:31 -0500 Subject: [Linux-cluster] Hardware/Sofware for VM use Message-ID: <2008410145231.870495@leena> I've had to take a break from hardware for a while as software was calling my name. Now I need to get my act together on the hardware side so thought I would ask for thoughts. I've been wanting to consolidate machines for a long time as power and cooling and waste of resources are of concern. I know I've touched on this before but the response was somewhat overwhelming to me since I have not yet touched VM environments so was not/am not aware of terminology yet. Questions; Hardware? I have powerful 8/16-way servers which I could use as VM servers. But, I also have dozens of small 1Ghz/512MB servers in the form of very reliable blade servers. I had asked about the possibility of creating an SSI style cluster then creating VM's out of that. Are there any such methods being used? Where a cluster of parallel processing computers is an options today? Or, are folks just basically using the most powerful computers they have, running two or more with shared storage for redundancy? Software; I'd like to take advantage of VM but it's not clear to me what is already included in the Linux packages, what is not, etc. I see a lot of projects out there and I'd much prefer to use RPM as well. If I have to, I can go the compile route. I prefer RPM's because they are so easy to manage for me. What would be a good starting path, one which would allow me to try VM, in a way that I can start migrating machines over to the new method. Thanks very much for your input/help on this. Mike From Christopher.Barry at qlogic.com Thu Apr 10 20:18:55 2008 From: Christopher.Barry at qlogic.com (christopher barry) Date: Thu, 10 Apr 2008 16:18:55 -0400 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <47FB8346.1070802@gmail.com> References: <1207622206.5259.66.camel@localhost> <47FB8346.1070802@gmail.com> Message-ID: <1207858736.5188.77.camel@localhost> On Tue, 2008-04-08 at 09:37 -0500, Wendy Cheng wrote: > gordan at bobich.net wrote: > > > > > >> my setup: > >> 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's > >> not new stuff, but corporate standards dictated the rev of rhat. > > [...] > >> I'm noticing huge differences in compile times - or any home file access > >> really - when doing stuff in the same home directory on the gfs on > >> different nodes. For instance, the same compile on one node is ~12 > >> minutes - on another it's 18 minutes or more (not running concurrently). > >> I'm also seeing weird random pauses in writes, like saving a file in vi, > >> what would normally take less than a second, may take up to 10 seconds. Anyway, thought I would re-connect to you all and let you know how this worked out. We ended up scrapping gfs. Not because it's not a great fs, but because I was using it in a way that was playing to it's weak points. I had a lot of time and energy invested in it, and it was hard to let it go. Turns out that connecting to the NetApp filer via nfs is faster for this workload. I couldn't believe it either, as my bonnie and dd type tests showed gfs to be faster. But for the use case of large sets of very small files, and lots of stats going on, gfs simply cannot compete with NetApp's nfs implementation. GFS is an excellent fs, and it has it's place in the landscape - but for a development build system, the NetApp is simply phenomenal. Thanks all for your assistance in the many months I have sought and received advice and help here. Regards, Christopher Barry From agspoon at gmail.com Thu Apr 10 21:27:05 2008 From: agspoon at gmail.com (Craig Johnston) Date: Thu, 10 Apr 2008 14:27:05 -0700 Subject: [Linux-cluster] Achieving a stable cluster with a 2.6.21 kernel Message-ID: We would like to achieve a stable GFS/GFS2 cluster configuration using a non-Redhat distribution that is based on a 2.6.21 kernel. Our first attempt was to obtain the Fedora Core 7 source rpms for the various components (cman, rgmanager, openais, etc.). We were successful in incorporating these packages into our distribution, and creating what should be a working cluster configuration with multiple nodes sharing a set of GFS2 file systems from an iSCSI SAN. The problem is that it is all very unstable, takes forever to start-up, and locks up under even small load. We would like to move to a more recent version of the cluster suite and update the kernel gfs2 and dlm modules for a 2.6.21 kernel. We need to stick with 2.6.21 for other reasons (vendor support mostly), and we figure if it all can be back ported for RHEL5.1 (2.6.18) it should be doable for 2.6.21. We just don't know where to start. Any advice on how we might proceed on this process would be greatly appreciated. Thanks, Craig From swhiteho at redhat.com Fri Apr 11 08:08:49 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Fri, 11 Apr 2008 09:08:49 +0100 Subject: [Linux-cluster] Achieving a stable cluster with a 2.6.21 kernel In-Reply-To: References: Message-ID: <1207901329.3635.342.camel@quoit> Hi, On Thu, 2008-04-10 at 14:27 -0700, Craig Johnston wrote: > We would like to achieve a stable GFS/GFS2 cluster configuration using > a non-Redhat distribution that is based on a 2.6.21 kernel. Our first > attempt was to obtain the Fedora Core 7 source rpms for the various > components (cman, rgmanager, openais, etc.). We were successful in > incorporating these packages into our distribution, and creating what > should be a working cluster configuration with multiple nodes sharing > a set of GFS2 file systems from an iSCSI SAN. > > The problem is that it is all very unstable, takes forever to > start-up, and locks up under even small load. We would like to move > to a more recent version of the cluster suite and update the kernel > gfs2 and dlm modules for a 2.6.21 kernel. We need to stick with > 2.6.21 for other reasons (vendor support mostly), and we figure if it > all can be back ported for RHEL5.1 (2.6.18) it should be doable for > 2.6.21. We just don't know where to start. > > Any advice on how we might proceed on this process would be greatly appreciated. > > Thanks, > Craig > If you want to use GFS2, then try F-8, or rawhide with the most uptodate set of packages. I would not recommend using a kernel that old for GFS2, Steve. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From gordan at bobich.net Fri Apr 11 09:38:17 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Fri, 11 Apr 2008 10:38:17 +0100 (BST) Subject: [Linux-cluster] Achieving a stable cluster with a 2.6.21 kernel In-Reply-To: References: Message-ID: On Thu, 10 Apr 2008, Craig Johnston wrote: > We would like to achieve a stable GFS/GFS2 cluster configuration using > a non-Redhat distribution that is based on a 2.6.21 kernel. Our first > attempt was to obtain the Fedora Core 7 source rpms for the various > components (cman, rgmanager, openais, etc.). We were successful in > incorporating these packages into our distribution, and creating what > should be a working cluster configuration with multiple nodes sharing > a set of GFS2 file systems from an iSCSI SAN. > > The problem is that it is all very unstable, takes forever to > start-up, and locks up under even small load. Use GFS1. GFS2 still does that. FC6+ no longer ships with GFS1 support built in as standard. If you're going to stick to the tried path, use RHEL5 (based) distributions. If you don't, you may well be better of just building the lot from source. It comes down to what your time is worth to you. Gordan From kadlec at sunserv.kfki.hu Fri Apr 11 11:05:08 2008 From: kadlec at sunserv.kfki.hu (Kadlecsik Jozsef) Date: Fri, 11 Apr 2008 13:05:08 +0200 (CEST) Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: References: <1207622206.5259.66.camel@localhost> <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> <47FCDB60.8030807@gmail.com> <47FD02D3.7010802@gmail.com> <47FD2A01.1030708@gmail.com> Message-ID: On Thu, 10 Apr 2008, Kadlecsik Jozsef wrote: > But this is a good clue to what might bite us most! Our GFS cluster is an > almost mail-only cluster for users with Maildir. When the users experience > temporary hangups for several seconds (even when writing a new mail), it > might be due to the concurrent scanning for a new mail on one node by the > MUA and the delivery to the Maildir in another node by the MTA. > > What is really strange (and distrurbing) that such "hangups" can take > 10-20 seconds which is just too much for the users. Yesterday we started to monitor the number of locks/held locks on two of the machines. The results from the first day can be found at http://www.kfki.hu/~kadlec/gfs/. It looks as Maildir is definitely a wrong choice for GFS and we should consider to convert to mailbox format: at least I cannot explain the spikes in another way. > In order to look at the possible tuning options and the side effects, I > list what I have learned so far: > > - Increasing glock_purge (percent, default 0) helps to trim back the > unused glocks by gfs_scand itself. Otherwise glocks can accumulate and > gfs_scand eats more and more time at scanning the larger and > larger table of glocks. > - gfs_scand wakes up every scand_secs (default 5s) to scan the glocks, > looking for work to do. By increasing scand_secs one can lessen the load > produced by gfs_scand, but it'll hurt because flushing data can be > delayed. > - Decreasing demote_secs (seconds, default 300) helps to flush cached data > more often by moving write locks into less restricted states. Flushing > often helps to avoid burstiness *and* to prolong another nodes' > lock access. Question is, what are the side effects of small > demote_secs values? (Probably there is no much point to choose > smaller demote_secs value than scand_secs.) > > Currently we are running with 'glock_purge = 20' and 'demote_secs = 30'. Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From jmacfarland at nexatech.com Fri Apr 11 14:07:41 2008 From: jmacfarland at nexatech.com (Jeff Macfarland) Date: Fri, 11 Apr 2008 09:07:41 -0500 Subject: [Linux-cluster] SSC Console? In-Reply-To: <20084912180.894658@leena> References: <20084912180.894658@leena> Message-ID: <47FF70AD.2020102@nexatech.com> Sounds like sun storage console to me. google "storedge ssconsole" isplist at logicore.net wrote: > Anyone know what an SSC console is? I'm trying to migrate some data between storage devices but the device requires that I enter the command from an SSC Console. > > EXAMPLE; > mylinuxsystem:~/tmp$ssc 192.168.43.70 > > Just like ssh, telnet, etc. Can't find this anywhere. > > Since everyone here is into storage, thought I'd ask. > > Mike > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > ___________________________________________________________________________ > > Inbound Email has been scanned by Nexa Technologies Email Security Systems. > ___________________________________________________________________________ -- Jeff Macfarland (jmacfarland at nexatech.com) Nexa Technologies - 972.747.8879 Systems Administrator GPG Key ID: 0x5F1CA61B GPG Key Server: hkp://wwwkeys.pgp.net From isplist at logicore.net Fri Apr 11 14:32:08 2008 From: isplist at logicore.net (isplist at logicore.net) Date: Fri, 11 Apr 2008 09:32:08 -0500 Subject: [Linux-cluster] SSC Console? In-Reply-To: <47FF70AD.2020102@nexatech.com> Message-ID: <20084119328.411953@leena> It might be if it's something that can run on linux. I've come across symantec and sun but not much else giving this protocol away so far. It seems to run on default port 206. EXAMPLE; mylinuxsystem:~/tmp$ssc 192.168.43.70 On Fri, 11 Apr 2008 09:07:41 -0500, Jeff Macfarland wrote: >?Sounds like sun storage console to me. google "storedge ssconsole" From isplist at logicore.net Fri Apr 11 14:40:08 2008 From: isplist at logicore.net (isplist at logicore.net) Date: Fri, 11 Apr 2008 09:40:08 -0500 Subject: [Linux-cluster] Hardware/Sofware for VM use In-Reply-To: <20080411120315.10707b89.pegasus@nerv.eu.org> Message-ID: <20084119408.728182@leena> I do use a plesk server and thanks for the lead on OpenVZ, looking at it right now. I am still wondering about the possible use of a number of slower servers being put to some good use. I wondered if perhaps redhat already has something along the lines of what I'm thinking about since they always have so many cool projects in the works. Mike From npf-mlists at eurotux.com Fri Apr 11 14:58:46 2008 From: npf-mlists at eurotux.com (Nuno Fernandes) Date: Fri, 11 Apr 2008 15:58:46 +0100 Subject: [Linux-cluster] RHEL cluster upgrade from 5.0 to 5.1 In-Reply-To: <200804101035.35425.npf-mlists@eurotux.com> References: <200804101035.35425.npf-mlists@eurotux.com> Message-ID: <200804111558.46449.npf-mlists@eurotux.com> On Thursday 10 April 2008 10:35:34 Nuno Fernandes wrote: > Hi, > > With a cluster of only clvmd is it possible to do a rolling upgrade from > 5.0 to 5.1? > > By rolling upgrade i mean, > > 1 - select 1 node > 2 - leave the node from cluster > 3 - upgrade to 5.1 > 4 - join to cluster > 5 - go to step 1 > > Any info? > Thanks, > Nuno Fernandes > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Does anyone know anything about this? ./npf From teigland at redhat.com Fri Apr 11 15:01:34 2008 From: teigland at redhat.com (David Teigland) Date: Fri, 11 Apr 2008 10:01:34 -0500 Subject: [Linux-cluster] cluster-2.03.00 Message-ID: <20080411150134.GB7435@redhat.com> A new source tarball of cluster code has been released: cluster-2.03.00 This has been taken from the STABLE2 branch in the cluster git tree. It is compatible with the current stable release of openais (0.80.3), and the current stable release of the kernel (2.6.24). ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.00.tar.gz To use gfs, a kernel patch is required to export three symbols from gfs2: ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch Abhijith Das (3): gfs2_tool: remove 'gfs2_tool counters' as they aren't implemented anymore gfs-kernel: fix for bz 429343 gfs_glock_is_locked_by_me assertion gfs2_tool manpage: gfs2_tool counters doesn't exist anymore. Andrew Price (1): [[BUILD] Warn and continue if CONFIG_KERNELVERSION is not found Bob Peterson (9): Resolves: bz 435917: GFS2: mkfs.gfs2 default lock protocol Resolves: bz 421761: 'gfs_tool lockdump' wrongly says 'unknown Resolves: bz 431945: GFS: gfs-kernel should use device major:minor Update to prior commit for bz431945: I forgot that STABLE2 Resolves: bz 436383: GFS filesystem size inconsistent Fix savemeta so it saves gfs-1 rg information properly Fix gfs2_edit print options (-p) to work properly for gfs-1 gfs2_edit was not recalculating the max block size after it figured Fix some compiler warnings in gfs2_edit Chris Feist (1): Added back in change to description line to make chkconfig work properly. Christine Caulfield (5): [DLM] Don't segfault if lvbptr is NULL [CMAN] Free up any queued messages when someone disconnects [CMAN] Limit outstanding replies [CMAN] valid port number & don't use it before validation Remove references to broadcast. David Teigland (4): doc: update usage.txt groupd: purge messages from dead nodes dlm_tool: print correct rq mode in lockdump libdlm: fix lvb copying Fabio M. Di Nitto (8): [BUILD] Fix configure script to handle releases [BUILD] Fix build system with openais whitetank [BUILD] Allow release version to contain padding 0's Add toplevel .gitignore [BUILD] Fix handling of version and libraries soname [BUILD] Fix man page install permission Revert "Fix help message to refer to script as 'fence_scsi_test'." Revert "fix bz277781 by accepting "nodename" as a synonym for "node"" Joel Becker (1): libdlm: Don't pass LKF_WAIT to the kernel Jonathan Brassow (4): rgmanager/lvm.sh: Fix bug 438816 rgmanager/lvm.sh: Fix bug bz242798 rgmanager/lvm.sh: change argument order of shell command rgmanager/lvm.sh: Minor comment updates Lon Hohberger (10): Add Sybase failover agent Update changelog Add / fix Oracle 10g failover agent [rgmanager] Make ip.sh check link states of non-ethernet devices [rgmanager] Set cloexec bit in msg_socket.c [rgmanager] Don't call quotaoff if quotas are not used [CMAN] Fix "Node X is undead" loop bug [rgmanager] Fix #432998 [cman] Apply missing fix for #315711 [CMAN] Make cman init script start qdiskd intelligently Ryan McCabe (1): fix bz277781 by accepting "nodename" as a synonym for "node" Ryan O'Hara (15): Variable should be quoted in conditional statement. Fix unregister code to report failure correctly. Remove "self" parameter. This was used to specify the name of the node Fix code to use get_key subroutine. Fix split calls to be consistent. Remove the optional LIMIT parameter. Replace /var/lock/subsys/${0##*/} with /var/lock/subsys/scsi_reserve. Fix success/failure reporting when registering devices at startup. Rewrite of get_scsi_devices function. Record devices that are successfully registered to /var/run/scsi_reserve. Allow 'stop' to release the reservation if and only if there are no other Attempt to register the node in the case where it must perform fence_scsi Fix help message to refer to script as 'fence_scsi_test'. BZ 248715 BZ: 373491, 373511, 373531, 373541, 373571, 429033 BZ 441323 : Redirect stderr to /dev/null when getting list of devices. .gitignore | 1 + cman/daemon/Makefile | 3 +- cman/daemon/cmanccs.c | 11 +- cman/daemon/cnxman-private.h | 2 +- cman/daemon/commands.c | 2 +- cman/daemon/daemon.c | 40 ++- cman/daemon/daemon.h | 3 +- cman/init.d/cman.in | 32 ++ cman/init.d/qdiskd | 21 +- cman/lib/Makefile | 14 +- cman/man/cman_tool.8 | 20 +- cman/qdisk/main.c | 34 +- configure | 87 +++- dlm/lib/Makefile | 26 +- dlm/lib/libdlm.c | 15 +- dlm/tool/main.c | 8 +- doc/usage.txt | 87 ++--- fence/agents/scsi/fence_scsi.pl | 248 ++++++++-- fence/agents/scsi/fence_scsi_test.pl | 171 ++++--- fence/agents/scsi/scsi_reserve | 300 ++++++++---- gfs-kernel/src/gfs/glock.h | 15 +- gfs-kernel/src/gfs/ops_address.c | 29 +- gfs-kernel/src/gfs/proc.c | 9 +- gfs/gfs_grow/main.c | 4 +- gfs/gfs_tool/util.c | 64 +-- gfs2/edit/gfs2hex.c | 12 +- gfs2/edit/hexedit.c | 178 ++++++-- gfs2/edit/hexedit.h | 32 ++ gfs2/edit/savemeta.c | 38 +- gfs2/man/gfs2_tool.8 | 4 - gfs2/man/mkfs.gfs2.8 | 6 +- gfs2/tool/Makefile | 3 +- gfs2/tool/counters.c | 203 -------- gfs2/tool/main.c | 5 - group/daemon/app.c | 25 + group/daemon/cpg.c | 1 + group/daemon/gd_internal.h | 1 + group/dlm_controld/member_cman.c | 8 + make/defines.mk.input | 1 + make/man.mk | 2 +- rgmanager/ChangeLog | 4 + rgmanager/src/clulib/msg_socket.c | 12 + rgmanager/src/daemons/restree.c | 2 +- rgmanager/src/resources/ASEHAagent.sh | 893 +++++++++++++++++++++++++++++++++ rgmanager/src/resources/Makefile | 3 +- rgmanager/src/resources/fs.sh | 51 ++- rgmanager/src/resources/ip.sh | 18 +- rgmanager/src/resources/lvm.metadata | 13 +- rgmanager/src/resources/lvm.sh | 14 +- rgmanager/src/resources/lvm_by_lv.sh | 15 +- rgmanager/src/resources/lvm_by_vg.sh | 22 +- rgmanager/src/resources/oracleas | 792 ----------------------------- rgmanager/src/resources/oracledb.sh | 869 ++++++++++++++++++++++++++++++++ 53 files changed, 2954 insertions(+), 1519 deletions(-) From s.wendy.cheng at gmail.com Fri Apr 11 15:28:37 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Fri, 11 Apr 2008 10:28:37 -0500 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <1207858736.5188.77.camel@localhost> References: <1207622206.5259.66.camel@localhost> <47FB8346.1070802@gmail.com> <1207858736.5188.77.camel@localhost> Message-ID: <47FF83A5.6010806@gmail.com> christopher barry wrote: > On Tue, 2008-04-08 at 09:37 -0500, Wendy Cheng wrote: > >> gordan at bobich.net wrote: >> >>> >>>> my setup: >>>> 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's >>>> not new stuff, but corporate standards dictated the rev of rhat. >>>> >>> [...] >>> >>>> I'm noticing huge differences in compile times - or any home file access >>>> really - when doing stuff in the same home directory on the gfs on >>>> different nodes. For instance, the same compile on one node is ~12 >>>> minutes - on another it's 18 minutes or more (not running concurrently). >>>> I'm also seeing weird random pauses in writes, like saving a file in vi, >>>> what would normally take less than a second, may take up to 10 seconds. >>>> > > Anyway, thought I would re-connect to you all and let you know how this > worked out. We ended up scrapping gfs. Not because it's not a great fs, > but because I was using it in a way that was playing to it's weak > points. I had a lot of time and energy invested in it, and it was hard > to let it go. Turns out that connecting to the NetApp filer via nfs is > faster for this workload. I couldn't believe it either, as my bonnie and > dd type tests showed gfs to be faster. But for the use case of large > sets of very small files, and lots of stats going on, gfs simply cannot > compete with NetApp's nfs implementation. GFS is an excellent fs, and it > has it's place in the landscape - but for a development build system, > the NetApp is simply phenomenal. > Assuming you run both configurations (nfs-wafl vs. gfs-san) on the very same netapp box (?) ... Both configurations have their pros and cons. The wafl-nfs runs on native mode that certainly has its advantages - you've made a good choice but the latter (gfs-on-netapp san) can work well in other situations. The biggest problem with your original configuration is the load-balancer. The round-robin (and its variants) scheduling will not work well if you have a write intensive workload that needs to fight for locks between multiple GFS nodes. IIRC, there are gfs customers running on build-compile development environment. They normally assign groups of users on different GFS nodes, say user id starting with a-e on node 1, f-j on node2, etc. One encouraging news from this email is gfs-netapp-san runs well on bonnie. GFS1 has been struggling with bonnie (large amount of smaller files within one single node) for a very long time. One of the reasons is its block allocation tends to get spread across the disk whenever there are resource group contentions. It is very difficult for linux IO scheduler to merge these blocks within one single server. When the workload becomes IO-bound, the locks are subsequently stalled and everything start to snow-ball after that. Netapp SAN has one more layer of block allocation indirection within its firmware and its write speed is "phenomenal" (I'm borrowing your words ;) ), mostly to do with the NVRAM where it can aggressively cache write data - this helps GFS to relieve its small file issue quite well. -- Wendy From Christopher.Barry at qlogic.com Fri Apr 11 15:47:16 2008 From: Christopher.Barry at qlogic.com (christopher barry) Date: Fri, 11 Apr 2008 11:47:16 -0400 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: <47FF83A5.6010806@gmail.com> References: <1207622206.5259.66.camel@localhost> <47FB8346.1070802@gmail.com> <1207858736.5188.77.camel@localhost> <47FF83A5.6010806@gmail.com> Message-ID: <1207928836.5229.33.camel@localhost> On Fri, 2008-04-11 at 10:28 -0500, Wendy Cheng wrote: > christopher barry wrote: > > On Tue, 2008-04-08 at 09:37 -0500, Wendy Cheng wrote: > > > >> gordan at bobich.net wrote: > >> > >>> > >>>> my setup: > >>>> 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's > >>>> not new stuff, but corporate standards dictated the rev of rhat. > >>>> > >>> [...] > >>> > >>>> I'm noticing huge differences in compile times - or any home file access > >>>> really - when doing stuff in the same home directory on the gfs on > >>>> different nodes. For instance, the same compile on one node is ~12 > >>>> minutes - on another it's 18 minutes or more (not running concurrently). > >>>> I'm also seeing weird random pauses in writes, like saving a file in vi, > >>>> what would normally take less than a second, may take up to 10 seconds. > >>>> > > > > Anyway, thought I would re-connect to you all and let you know how this > > worked out. We ended up scrapping gfs. Not because it's not a great fs, > > but because I was using it in a way that was playing to it's weak > > points. I had a lot of time and energy invested in it, and it was hard > > to let it go. Turns out that connecting to the NetApp filer via nfs is > > faster for this workload. I couldn't believe it either, as my bonnie and > > dd type tests showed gfs to be faster. But for the use case of large > > sets of very small files, and lots of stats going on, gfs simply cannot > > compete with NetApp's nfs implementation. GFS is an excellent fs, and it > > has it's place in the landscape - but for a development build system, > > the NetApp is simply phenomenal. > > > > Assuming you run both configurations (nfs-wafl vs. gfs-san) on the very > same netapp box (?) ... yes. > > Both configurations have their pros and cons. The wafl-nfs runs on > native mode that certainly has its advantages - you've made a good > choice but the latter (gfs-on-netapp san) can work well in other > situations. The biggest problem with your original configuration is the > load-balancer. The round-robin (and its variants) scheduling will not > work well if you have a write intensive workload that needs to fight for > locks between multiple GFS nodes. IIRC, there are gfs customers running > on build-compile development environment. They normally assign groups of > users on different GFS nodes, say user id starting with a-e on node 1, > f-j on node2, etc. exactly. I was about to implement the sh (source hash) scheduler in LVS, which I believe would have accomplished the same thing, only automatically, and in a statistically balanced way. Actually still might. I've had some developers test out the nfs solution and for some gfs is still better. I know that if users are pinned to a node - but can still failover in the event of node failure - this would yield the best possible performance. The main reason the IT group wants to use nfs, is for all of the other benefits, such as file-level snapshots, better backup performance, etc. Now that they see a chink in the gfs performance armor, mainly because I implemented the wrong load balancing algorithm, they're circling for the kill. I'm interested how well the nfs will scale with users vs. the gfs-san approach. > > One encouraging news from this email is gfs-netapp-san runs well on > bonnie. GFS1 has been struggling with bonnie (large amount of smaller > files within one single node) for a very long time. One of the reasons > is its block allocation tends to get spread across the disk whenever > there are resource group contentions. It is very difficult for linux IO > scheduler to merge these blocks within one single server. When the > workload becomes IO-bound, the locks are subsequently stalled and > everything start to snow-ball after that. Netapp SAN has one more layer > of block allocation indirection within its firmware and its write speed > is "phenomenal" (I'm borrowing your words ;) ), mostly to do with the > NVRAM where it can aggressively cache write data - this helps GFS to > relieve its small file issue quite well. Thanks for all of your input Wendy. -C > > -- Wendy > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From agspoon at gmail.com Fri Apr 11 16:05:35 2008 From: agspoon at gmail.com (Craig Johnston) Date: Fri, 11 Apr 2008 09:05:35 -0700 Subject: [Linux-cluster] Achieving a stable cluster with a 2.6.21 kernel In-Reply-To: <1207901329.3635.342.camel@quoit> References: <1207901329.3635.342.camel@quoit> Message-ID: On Fri, Apr 11, 2008 at 1:08 AM, Steven Whitehouse wrote: > Hi, > > > > On Thu, 2008-04-10 at 14:27 -0700, Craig Johnston wrote: > > We would like to achieve a stable GFS/GFS2 cluster configuration using > > a non-Redhat distribution that is based on a 2.6.21 kernel. Our first > > attempt was to obtain the Fedora Core 7 source rpms for the various > > components (cman, rgmanager, openais, etc.). We were successful in > > incorporating these packages into our distribution, and creating what > > should be a working cluster configuration with multiple nodes sharing > > a set of GFS2 file systems from an iSCSI SAN. > > > > The problem is that it is all very unstable, takes forever to > > start-up, and locks up under even small load. We would like to move > > to a more recent version of the cluster suite and update the kernel > > gfs2 and dlm modules for a 2.6.21 kernel. We need to stick with > > 2.6.21 for other reasons (vendor support mostly), and we figure if it > > all can be back ported for RHEL5.1 (2.6.18) it should be doable for > > 2.6.21. We just don't know where to start. > > > > Any advice on how we might proceed on this process would be greatly appreciated. > > > > Thanks, > > Craig > > > If you want to use GFS2, then try F-8, or rawhide with the most uptodate > set of packages. I would not recommend using a kernel that old for GFS2, > > Steve. Do you think we could be successful in patching up the GFS2/DLM modules in our 2.6.21 kernel to bring it up to a more recent version? How coupled is the GFS2/DLM code to the rest of the kernel? We have a number of machines running CentOS 5.1. Does it seem feasible to select the applicable patches from that distribution and apply them to a 2.6.21 kernel (with some tweaks no doubt)? Craig From agspoon at gmail.com Fri Apr 11 16:09:14 2008 From: agspoon at gmail.com (Craig Johnston) Date: Fri, 11 Apr 2008 09:09:14 -0700 Subject: [Linux-cluster] Achieving a stable cluster with a 2.6.21 kernel In-Reply-To: References: Message-ID: On Fri, Apr 11, 2008 at 2:38 AM, wrote: > On Thu, 10 Apr 2008, Craig Johnston wrote: > > > > We would like to achieve a stable GFS/GFS2 cluster configuration using > > a non-Redhat distribution that is based on a 2.6.21 kernel. Our first > > attempt was to obtain the Fedora Core 7 source rpms for the various > > components (cman, rgmanager, openais, etc.). We were successful in > > incorporating these packages into our distribution, and creating what > > should be a working cluster configuration with multiple nodes sharing > > a set of GFS2 file systems from an iSCSI SAN. > > > > The problem is that it is all very unstable, takes forever to > > start-up, and locks up under even small load. > > > > Use GFS1. GFS2 still does that. FC6+ no longer ships with GFS1 support > built in as standard. If you're going to stick to the tried path, use RHEL5 > (based) distributions. If you don't, you may well be better of just building > the lot from source. It comes down to what your time is worth to you. > > Gordan Yes, we thought that we might have better luck with GFS, but like you say it is not really available in more recent distributions. What would we need to do to get GFS working? It is only the gfs-tools that are missing, or do we need kernel changes as well? Do the recent versions of the cluster tools work with GFS? Craig From swhiteho at redhat.com Fri Apr 11 16:13:59 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Fri, 11 Apr 2008 17:13:59 +0100 Subject: [Linux-cluster] Achieving a stable cluster with a 2.6.21 kernel In-Reply-To: References: <1207901329.3635.342.camel@quoit> Message-ID: <1207930439.3635.352.camel@quoit> Hi, On Fri, 2008-04-11 at 09:05 -0700, Craig Johnston wrote: > On Fri, Apr 11, 2008 at 1:08 AM, Steven Whitehouse wrote: > > Hi, > > > > > > > > On Thu, 2008-04-10 at 14:27 -0700, Craig Johnston wrote: > > > We would like to achieve a stable GFS/GFS2 cluster configuration using > > > a non-Redhat distribution that is based on a 2.6.21 kernel. Our first > > > attempt was to obtain the Fedora Core 7 source rpms for the various > > > components (cman, rgmanager, openais, etc.). We were successful in > > > incorporating these packages into our distribution, and creating what > > > should be a working cluster configuration with multiple nodes sharing > > > a set of GFS2 file systems from an iSCSI SAN. > > > > > > The problem is that it is all very unstable, takes forever to > > > start-up, and locks up under even small load. We would like to move > > > to a more recent version of the cluster suite and update the kernel > > > gfs2 and dlm modules for a 2.6.21 kernel. We need to stick with > > > 2.6.21 for other reasons (vendor support mostly), and we figure if it > > > all can be back ported for RHEL5.1 (2.6.18) it should be doable for > > > 2.6.21. We just don't know where to start. > > > > > > Any advice on how we might proceed on this process would be greatly appreciated. > > > > > > Thanks, > > > Craig > > > > > If you want to use GFS2, then try F-8, or rawhide with the most uptodate > > set of packages. I would not recommend using a kernel that old for GFS2, > > > > Steve. > > Do you think we could be successful in patching up the GFS2/DLM > modules in our 2.6.21 kernel to bring it up to a more recent version? > How coupled is the GFS2/DLM code to the rest of the kernel? We have > a number of machines running CentOS 5.1. Does it seem feasible to > select the applicable patches from that distribution and apply them to > a 2.6.21 kernel (with some tweaks no doubt)? > > Craig Not easily. One of the bugs since then was solved by a change in the VFS so that its not just a question of applying patches to gfs2 on its own. The version of GFS2 in RHEL has a different fix for this problem though, so you might be able to borrow that. Either way its not going to be an easy task and using a more recent kernel would be a much quicker way of getting a more stable GFS2, Steve. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lexi.herrera at gmail.com Fri Apr 11 17:22:29 2008 From: lexi.herrera at gmail.com (Lexi Herrera) Date: Fri, 11 Apr 2008 13:22:29 -0400 Subject: [Linux-cluster] rhel hpcc Message-ID: <6c3ea40804111022w18e53040r75cde52ef3ba605a@mail.gmail.com> hello to all, I need aid to install cluster hpcc with red hat el as r 4.5, it has 4 node with 4 cpu each on, the computers are ibm blades. -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.wendy.cheng at gmail.com Sat Apr 12 04:16:52 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Fri, 11 Apr 2008 23:16:52 -0500 Subject: [Linux-cluster] dlm and IO speed problem In-Reply-To: References: <1207622206.5259.66.camel@localhost> <1a2a6dd60804080213x17dc2578s75fcf7a92ea35790@mail.gmail.com> <47FCDB60.8030807@gmail.com> <47FD02D3.7010802@gmail.com> <47FD2A01.1030708@gmail.com> Message-ID: <480037B4.8000406@gmail.com> Kadlecsik Jozsef wrote: > On Thu, 10 Apr 2008, Kadlecsik Jozsef wrote: > > >> But this is a good clue to what might bite us most! Our GFS cluster is an >> almost mail-only cluster for users with Maildir. When the users experience >> temporary hangups for several seconds (even when writing a new mail), it >> might be due to the concurrent scanning for a new mail on one node by the >> MUA and the delivery to the Maildir in another node by the MTA. >> I personally don't know much about mail server. But if anyone can explain more about what these two processes (?) do, say, how does that "MTA" deliver its mail (by "rename" system call ?) and/or how mails are moved from which node to where, we may have a better chance to figure this puzzle out. Note that "rename" system call is normally very expensive. Minimum 4 exclusive locks are required (two directory locks, one file lock for unlink, one file lock for link), plus resource group lock if block allocation is required. There are numerous chances for deadlocks if not handled carefully. The issue is further worsen by the way GFS1 does its lock ordering - it obtains multiple locks based on lock name order. Most of the locknames are taken from inode number so their sequence always quite random. As soon as lock contention occurs, lock requests will be serialized to avoid deadlocks. So this may be a cause for these spikes where "rename"(s) are struggling to get lock order straight. But I don't know for sure unless someone explains how email server does its things. BTW, GFS2 has relaxed this lock order issue so it should work better. I'm having a trip (away from internet) but I'm interested to know this story... Maybe by the time I get back on my laptop, someone has figured this out. But please do share the story :) ... -- Wendy >> What is really strange (and distrurbing) that such "hangups" can take >> 10-20 seconds which is just too much for the users. >> > > Yesterday we started to monitor the number of locks/held locks on two of > the machines. The results from the first day can be found at > http://www.kfki.hu/~kadlec/gfs/. > > It looks as Maildir is definitely a wrong choice for GFS and we should > consider to convert to mailbox format: at least I cannot explain the > spikes in another way. > > >> In order to look at the possible tuning options and the side effects, I >> list what I have learned so far: >> >> - Increasing glock_purge (percent, default 0) helps to trim back the >> unused glocks by gfs_scand itself. Otherwise glocks can accumulate and >> gfs_scand eats more and more time at scanning the larger and >> larger table of glocks. >> - gfs_scand wakes up every scand_secs (default 5s) to scan the glocks, >> looking for work to do. By increasing scand_secs one can lessen the load >> produced by gfs_scand, but it'll hurt because flushing data can be >> delayed. >> - Decreasing demote_secs (seconds, default 300) helps to flush cached data >> more often by moving write locks into less restricted states. Flushing >> often helps to avoid burstiness *and* to prolong another nodes' >> lock access. Question is, what are the side effects of small >> demote_secs values? (Probably there is no much point to choose >> smaller demote_secs value than scand_secs.) >> >> Currently we are running with 'glock_purge = 20' and 'demote_secs = 30'. >> > > Best regards, > Jozsef > -- > E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu > PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt > Address: KFKI Research Institute for Particle and Nuclear Physics > H-1525 Budapest 114, POB. 49, Hungary > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From lists at tangent.co.za Sun Apr 13 16:43:05 2008 From: lists at tangent.co.za (Chris Picton) Date: Sun, 13 Apr 2008 16:43:05 +0000 (UTC) Subject: [Linux-cluster] DRBD and redhat cluster Message-ID: Hi all I am planning a pair of machines using DRBD to export redundant block devices via gnbd I will have about 10 other machines using gfs2 to access this data. I have thought of a few different ways of accomplishing this, and would appreciate any feedback you may have. I can run the drbd exports in one of three ways: 1. Single drbd device in single master mode. The drbd primary resource, shared ip address and gndb export will fail over between the two nodes. Advantages. Simple setup Disadvantages. One machine always fairly idle 2. Two drbd devices on each server. Server1 will be primary on the one device, server 2 will be primary on the other device. As above, each drbd resource will have an associated IP address and gnbd export associated with it. These resources will fail over between the two nodes. The remaining cluster nodes will assemble these using a striped clvm. Advantages. The two nodes will be equally used. Will get double the throughput from the pair of machines (assuming high speed bonded crossover between the two). Also, will be able to add another pair of these machines later to increase bandwidth and storage space for the LVM. Disadvantages: More complex 3. A drbd device in master/master mode, exported from both via gnbd. Cluster members will access this via dm-multipath. Advantages: not sure Disadvantages: Will bandwidth be shared between the two machines? In addition to the options above, I am also wondering about the cluster software I should use. I will have to use RHCS to use gfs2 and fence_gnbd correctly, but is heartbeat a better choice than failover domains for the drbd exports? Any insight will be helpful Chris From gordan at bobich.net Mon Apr 14 12:19:22 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Mon, 14 Apr 2008 13:19:22 +0100 (BST) Subject: [Linux-cluster] Fencing Driver API Requirements Message-ID: Hi, I remember that this was mentioned several times in the last few months, but has any documentation been put together on the API that the fencing drivers are supposed to cover? I'm looking into writing a fencing driver based on disabling switch ports on a managed 3com switch via the telnet interface, and I'd like to make sure that it conforms to any speciffic requirements that might exist. If someone could point me at the relevant URL, that would be most appreciated. Gordan From arjuna.christensen at maxnet.co.nz Mon Apr 14 13:31:25 2008 From: arjuna.christensen at maxnet.co.nz (Arjuna Christensen) Date: Tue, 15 Apr 2008 01:31:25 +1200 Subject: [Linux-cluster] DRBD and redhat cluster In-Reply-To: References: Message-ID: <6DD7CC182D1E154E9F5FF6301B077EFE72EB16@exchange01.office.maxnet.co.nz> I've been using RHCS to control DRBD quite happily, but only in a active/passive scenario. All it requires is a little script, and an rgmanager '' object: #!/bin/bash exec /etc/ha.d/resource.d/drbddisk $@ (/etc/ha.d/resource.d/drbddisk is installed by the DRBD package) Regards, Arjuna Christensen?|?Systems Engineer? Maximum Internet Ltd DDI: + 64 9?913 9683 | Ph: +64 9 915 1825 | Fax:: +64 9 300 7227 arjuna.christensen at maxnet.co.nz| www.maxnet.co.nz -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Picton Sent: Monday, 14 April 2008 4:43 a.m. To: linux-cluster at redhat.com Subject: [Linux-cluster] DRBD and redhat cluster Hi all I am planning a pair of machines using DRBD to export redundant block devices via gnbd I will have about 10 other machines using gfs2 to access this data. I have thought of a few different ways of accomplishing this, and would appreciate any feedback you may have. I can run the drbd exports in one of three ways: 1. Single drbd device in single master mode. The drbd primary resource, shared ip address and gndb export will fail over between the two nodes. Advantages. Simple setup Disadvantages. One machine always fairly idle 2. Two drbd devices on each server. Server1 will be primary on the one device, server 2 will be primary on the other device. As above, each drbd resource will have an associated IP address and gnbd export associated with it. These resources will fail over between the two nodes. The remaining cluster nodes will assemble these using a striped clvm. Advantages. The two nodes will be equally used. Will get double the throughput from the pair of machines (assuming high speed bonded crossover between the two). Also, will be able to add another pair of these machines later to increase bandwidth and storage space for the LVM. Disadvantages: More complex 3. A drbd device in master/master mode, exported from both via gnbd. Cluster members will access this via dm-multipath. Advantages: not sure Disadvantages: Will bandwidth be shared between the two machines? In addition to the options above, I am also wondering about the cluster software I should use. I will have to use RHCS to use gfs2 and fence_gnbd correctly, but is heartbeat a better choice than failover domains for the drbd exports? Any insight will be helpful Chris -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From mgrac at redhat.com Mon Apr 14 18:47:32 2008 From: mgrac at redhat.com (Marek 'marx' Grac) Date: Mon, 14 Apr 2008 20:47:32 +0200 Subject: [Linux-cluster] Fencing Driver API Requirements In-Reply-To: References: Message-ID: <4803A6C4.20706@redhat.com> Hi, gordan at bobich.net wrote: > > I remember that this was mentioned several times in the last few > months, but has any documentation been put together on the API that > the fencing drivers are supposed to cover? > > I'm looking into writing a fencing driver based on disabling switch > ports on a managed 3com switch via the telnet interface, and I'd like > to make sure > that it conforms to any speciffic requirements that might exist. If > someone could point me at the relevant URL, that would be most > appreciated. There is a new python module in the git (master branch / cluster/gence/agents/lib/fencing.py) that should contain everything you should need to write a fence agent. This module was used to built several agents (they are just in the git tree) eg. apc/apc.py, drac/drac5.py, wti/wti.py. If you will find any problem with fencing.py, let me know and I will try to fix it. marx, -- Marek Grac Red Hat Czech s.r.o. From underscore_dot at yahoo.com Mon Apr 14 21:40:10 2008 From: underscore_dot at yahoo.com (nch) Date: Mon, 14 Apr 2008 14:40:10 -0700 (PDT) Subject: [Linux-cluster] how can I share a logical volume? Message-ID: <831443.94855.qm@web32406.mail.mud.yahoo.com> Hello, everybody. I'm trying to run a cluster with 3 nodes. One of them would share storage with the other two using GFS and DLM (kernel 2.4.18-6). I was able to start ccsd, cman, fenced and clvmd in all nodes. I've defined a logical volume in the storage node and was able to gfs_mkfs, activate it locally and mount it, but I don't know how to make it available/visible to the other two nodes. Do you know how to do this? I've followed instruction given in http://sources.redhat.com/cluster/doc/usage.txt (except for setting locking_type=2). Many thanks. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shawnlhood at gmail.com Tue Apr 15 00:06:07 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Mon, 14 Apr 2008 20:06:07 -0400 Subject: [Linux-cluster] how can I share a logical volume? In-Reply-To: <831443.94855.qm@web32406.mail.mud.yahoo.com> References: <831443.94855.qm@web32406.mail.mud.yahoo.com> Message-ID: As far as I know, you should be able to at least SEE the logical volume as long as there is a path to the physical volumes on the other nodes. Are you able to see the same block devices (eg /dev/sd?) on the other nodes? Shawn Hood 2008/4/14 nch : > > Hello, everybody. > > I'm trying to run a cluster with 3 nodes. One of them would share storage > with the other two using GFS and DLM (kernel 2.4.18-6). > I was able to start ccsd, cman, fenced and clvmd in all nodes. I've defined > a logical volume in the storage node and was able to gfs_mkfs, activate it > locally and mount it, but I don't know how to make it available/visible to > the other two nodes. > Do you know how to do this? > I've followed instruction given in > http://sources.redhat.com/cluster/doc/usage.txt (except for setting > locking_type=2). > > Many thanks. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From underscore_dot at yahoo.com Tue Apr 15 06:22:11 2008 From: underscore_dot at yahoo.com (nch) Date: Mon, 14 Apr 2008 23:22:11 -0700 (PDT) Subject: [Linux-cluster] how can I share a logical volume? Message-ID: <694493.83478.qm@web32404.mail.mud.yahoo.com> No, I can't see the logical volumes on the other nodes. vgscan doesn't show any, nor I can find any new devices in /dev. As I couldn't find docs/examples on this particular point, I really don't know what to expect. I'm trying with different types of logical volumes (stripped, mirrored), but didn't make a difference. For the moment, I'm running all the stuff on virtual machines, could this be an issue? For the moment I'm using a minimal cluster.conf, in which I just declare the nodes. Should I add specific configurations to it? Lots of thanks. ----- Original Message ---- From: Shawn Hood To: linux clustering Sent: Tuesday, April 15, 2008 2:06:07 AM Subject: Re: [Linux-cluster] how can I share a logical volume? As far as I know, you should be able to at least SEE the logical volume as long as there is a path to the physical volumes on the other nodes. Are you able to see the same block devices (eg /dev/sd?) on the other nodes? Shawn Hood 2008/4/14 nch : > > Hello, everybody. > > I'm trying to run a cluster with 3 nodes. One of them would share storage > with the other two using GFS and DLM (kernel 2.4.18-6). > I was able to start ccsd, cman, fenced and clvmd in all nodes. I've defined > a logical volume in the storage node and was able to gfs_mkfs, activate it > locally and mount it, but I don't know how to make it available/visible to > the other two nodes. > Do you know how to do this? > I've followed instruction given in > http://sources.redhat.com/cluster/doc/usage.txt (except for setting > locking_type=2). > > Many thanks. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alain.Moulle at bull.net Tue Apr 15 14:22:24 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Tue, 15 Apr 2008 16:22:24 +0200 Subject: [Linux-cluster] CS5 / timers tuning (contd) Message-ID: <4804BA20.30709@bull.net> Hi Lon Thans again, but that's strange because in the man , the recommended values are : intervall="1" tko="10" and so we have a result < 21s which is the default value of heart-beat timer, so not a hair above like you recommened in previous email ... extract of man qddisk : interval="1" This is the frequency of read/write cycles, in seconds. tko="10" This is the number of cycles a node must miss in order to be declared dead. ? PS : " don't recall if there's a way to do it from cluster.conf" yes we can change the deadnode_timeout in cluster.conf : Thanks Regards Alain Moull? From andrew at ntsg.umt.edu Tue Apr 15 15:06:30 2008 From: andrew at ntsg.umt.edu (Andrew A. Neuschwander) Date: Tue, 15 Apr 2008 09:06:30 -0600 (MDT) Subject: [Linux-cluster] how can I share a logical volume? In-Reply-To: <694493.83478.qm@web32404.mail.mud.yahoo.com> References: <694493.83478.qm@web32404.mail.mud.yahoo.com> Message-ID: <41086.10.8.105.69.1208271990.squirrel@secure.ntsg.umt.edu> Did you setup the node with the storage as a gnbd server and the other nodes as gnbd clients? I think this is what you want in order for the nodes to all have block level access to the storage for clvmd and dlm to run on top of. -A -- Andrew A. Neuschwander, RHCE Linux Systems/Software Engineer College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew at ntsg.umt.edu - 406.243.6310 On Tue, April 15, 2008 12:22 am, nch wrote: > No, I can't see the logical volumes on the other nodes. vgscan doesn't > show any, nor I can find any new devices in /dev. > As I couldn't find docs/examples on this particular point, I really don't > know what to expect. > I'm trying with different types of logical volumes (stripped, mirrored), > but didn't make a difference. > For the moment, I'm running all the stuff on virtual machines, could this > be an issue? > For the moment I'm using a minimal cluster.conf, in which I just declare > the nodes. Should I add specific configurations to it? > > Lots of thanks. > > ----- Original Message ---- > From: Shawn Hood > To: linux clustering > Sent: Tuesday, April 15, 2008 2:06:07 AM > Subject: Re: [Linux-cluster] how can I share a logical volume? > > As far as I know, you should be able to at least SEE the logical > volume as long as there is a path to the physical volumes on the other > nodes. Are you able to see the same block devices (eg /dev/sd?) on > the other nodes? > > Shawn Hood > > > > 2008/4/14 nch : >> >> Hello, everybody. >> >> I'm trying to run a cluster with 3 nodes. One of them would share >> storage >> with the other two using GFS and DLM (kernel 2.4.18-6). >> I was able to start ccsd, cman, fenced and clvmd in all nodes. I've >> defined >> a logical volume in the storage node and was able to gfs_mkfs, activate >> it >> locally and mount it, but I don't know how to make it available/visible >> to >> the other two nodes. >> Do you know how to do this? >> I've followed instruction given in >> http://sources.redhat.com/cluster/doc/usage.txt (except for setting >> locking_type=2). >> >> Many thanks. >> >> From lexi.herrera at gmail.com Tue Apr 15 16:49:08 2008 From: lexi.herrera at gmail.com (Lexi Herrera) Date: Tue, 15 Apr 2008 12:49:08 -0400 Subject: [Linux-cluster] red hat enterprise Message-ID: <6c3ea40804150949i45f3b50cs3569e96221a2ea90@mail.gmail.com> hi everybody, i am new in my job and new in linux and not have enough experience with this installations, i know solaris, but don't worry. I need install a high performance cluster with red hat enterprise and the information that i knows is this: - red hat enterprise as 4.5 - torche - pvm - mpi - promax - focus - gpfs or better - ganglia or better please i need a lot aid to make this installation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jruemker at redhat.com Tue Apr 15 18:04:46 2008 From: jruemker at redhat.com (John Ruemker) Date: Tue, 15 Apr 2008 14:04:46 -0400 Subject: [Linux-cluster] how can I share a logical volume? In-Reply-To: <41086.10.8.105.69.1208271990.squirrel@secure.ntsg.umt.edu> References: <694493.83478.qm@web32404.mail.mud.yahoo.com> <41086.10.8.105.69.1208271990.squirrel@secure.ntsg.umt.edu> Message-ID: <4804EE3E.4000005@redhat.com> When you say you want to share storage with the other 2 nodes, do you mean only one node is physically connected to the storage and exports it to the other 2? Or do you mean that all three nodes are connected to the same storage and they share the device? If the former, gnbd is probably what you want. If the latter, you should see the same devices (/dev/sdX) on each node. If you do not, you have misconfigured your HBA or LUNs. Once they all see the same devices, you should be able to start clvmd and all 3 will see the clustered volume group. John Andrew A. Neuschwander wrote: > Did you setup the node with the storage as a gnbd server and the other > nodes as gnbd clients? I think this is what you want in order for the > nodes to all have block level access to the storage for clvmd and dlm to > run on top of. > > -A > From shawnlhood at gmail.com Tue Apr 15 18:26:47 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Tue, 15 Apr 2008 14:26:47 -0400 Subject: [Linux-cluster] red hat enterprise In-Reply-To: <6c3ea40804150949i45f3b50cs3569e96221a2ea90@mail.gmail.com> References: <6c3ea40804150949i45f3b50cs3569e96221a2ea90@mail.gmail.com> Message-ID: I hate to break it to you, but this kind of message isn't going to get you anywhere. I can assure you that many who read this message are thinking RTFM (see http://en.wikipedia.org/wiki/RTFM). You're going to have to hit the books like the rest of us. Shawn Hood 2008/4/15 Lexi Herrera : > > > hi everybody, i am new in my job and new in linux and not have enough > experience with this installations, i know solaris, but don't worry. I need > install a high performance cluster with red hat enterprise and the > information that i knows is this: > > > > - red hat enterprise as 4.5 > > - torche > > - pvm > > - mpi > > - promax > > - focus > > - gpfs or better > > - ganglia or better > > > > please i need a lot aid to make this installation. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Shawn Hood (910) 670-1819 Mobile From underscore_dot at yahoo.com Tue Apr 15 20:21:16 2008 From: underscore_dot at yahoo.com (nch) Date: Tue, 15 Apr 2008 13:21:16 -0700 (PDT) Subject: [Linux-cluster] how can I share a logical volume? Message-ID: <757725.56266.qm@web32403.mail.mud.yahoo.com> The gnbd approach seems to fit, although don't quite undestand why this approach is better than the one I started with. However, I upgraded to kernel 2.6.24, cluster-2.03.00, ais, ... and followed instructions described in doc/min-gfs.txt in ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.00.tar.gz. I reached the point in which I could gfs_mkfs, but I could not mount the new fs cause it complains about insufficient number of journals (I tried 2 and 4 journals) while having only one cluster node and the gnbd server. Kind regards ----- Original Message ---- From: Andrew A. Neuschwander To: linux clustering Sent: Tuesday, April 15, 2008 5:06:30 PM Subject: Re: [Linux-cluster] how can I share a logical volume? Did you setup the node with the storage as a gnbd server and the other nodes as gnbd clients? I think this is what you want in order for the nodes to all have block level access to the storage for clvmd and dlm to run on top of. -A -- Andrew A. Neuschwander, RHCE Linux Systems/Software Engineer College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew at ntsg.umt.edu - 406.243.6310 On Tue, April 15, 2008 12:22 am, nch wrote: > No, I can't see the logical volumes on the other nodes. vgscan doesn't > show any, nor I can find any new devices in /dev. > As I couldn't find docs/examples on this particular point, I really don't > know what to expect. > I'm trying with different types of logical volumes (stripped, mirrored), > but didn't make a difference. > For the moment, I'm running all the stuff on virtual machines, could this > be an issue? > For the moment I'm using a minimal cluster.conf, in which I just declare > the nodes. Should I add specific configurations to it? > > Lots of thanks. > > ----- Original Message ---- > From: Shawn Hood > To: linux clustering > Sent: Tuesday, April 15, 2008 2:06:07 AM > Subject: Re: [Linux-cluster] how can I share a logical volume? > > As far as I know, you should be able to at least SEE the logical > volume as long as there is a path to the physical volumes on the other > nodes. Are you able to see the same block devices (eg /dev/sd?) on > the other nodes? > > Shawn Hood > > > > 2008/4/14 nch : >> >> Hello, everybody. >> >> I'm trying to run a cluster with 3 nodes. One of them would share >> storage >> with the other two using GFS and DLM (kernel 2.4.18-6). >> I was able to start ccsd, cman, fenced and clvmd in all nodes. I've >> defined >> a logical volume in the storage node and was able to gfs_mkfs, activate >> it >> locally and mount it, but I don't know how to make it available/visible >> to >> the other two nodes. >> Do you know how to do this? >> I've followed instruction given in >> http://sources.redhat.com/cluster/doc/usage.txt (except for setting >> locking_type=2). >> >> Many thanks. >> >> -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From underscore_dot at yahoo.com Tue Apr 15 20:38:49 2008 From: underscore_dot at yahoo.com (nch) Date: Tue, 15 Apr 2008 13:38:49 -0700 (PDT) Subject: [Linux-cluster] how can I share a logical volume? Message-ID: <997501.91333.qm@web32401.mail.mud.yahoo.com> Only one node is phisically connected, at this moment (this might change in the future). I was able to create a logical volume and mount it locally, so I assume everything was correctly connected. Am I wrong? Thank you. ----- Original Message ---- From: John Ruemker To: linux clustering Sent: Tuesday, April 15, 2008 8:04:46 PM Subject: Re: [Linux-cluster] how can I share a logical volume? When you say you want to share storage with the other 2 nodes, do you mean only one node is physically connected to the storage and exports it to the other 2? Or do you mean that all three nodes are connected to the same storage and they share the device? If the former, gnbd is probably what you want. If the latter, you should see the same devices (/dev/sdX) on each node. If you do not, you have misconfigured your HBA or LUNs. Once they all see the same devices, you should be able to start clvmd and all 3 will see the clustered volume group. John Andrew A. Neuschwander wrote: > Did you setup the node with the storage as a gnbd server and the other > nodes as gnbd clients? I think this is what you want in order for the > nodes to all have block level access to the storage for clvmd and dlm to > run on top of. > > -A > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lexi.herrera at gmail.com Tue Apr 15 20:47:05 2008 From: lexi.herrera at gmail.com (Lexi Herrera) Date: Tue, 15 Apr 2008 16:47:05 -0400 Subject: [Linux-cluster] red hat enterprise In-Reply-To: References: <6c3ea40804150949i45f3b50cs3569e96221a2ea90@mail.gmail.com> Message-ID: <6c3ea40804151347n3edb4624o410af2701996d2a3@mail.gmail.com> thank you very much by its sincere and fast answer. On Tue, Apr 15, 2008 at 2:26 PM, Shawn Hood wrote: > I hate to break it to you, but this kind of message isn't going to get > you anywhere. I can assure you that many who read this message are > thinking RTFM (see http://en.wikipedia.org/wiki/RTFM). You're going > to have to hit the books like the rest of us. > > Shawn Hood > > 2008/4/15 Lexi Herrera : > > > > > > hi everybody, i am new in my job and new in linux and not have enough > > experience with this installations, i know solaris, but don't worry. I > need > > install a high performance cluster with red hat enterprise and the > > information that i knows is this: > > > > > > > > - red hat enterprise as 4.5 > > > > - torche > > > > - pvm > > > > - mpi > > > > - promax > > > > - focus > > > > - gpfs or better > > > > - ganglia or better > > > > > > > > please i need a lot aid to make this installation. > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Shawn Hood > (910) 670-1819 Mobile > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From underscore_dot at yahoo.com Wed Apr 16 10:00:42 2008 From: underscore_dot at yahoo.com (nch) Date: Wed, 16 Apr 2008 03:00:42 -0700 (PDT) Subject: [Linux-cluster] how can I share a logical volume? Message-ID: <21508.16024.qm@web32401.mail.mud.yahoo.com> This is the exact error message: client1:# mount -t gfs /dev/gnbd/sharedvol /mnt Trying to join cluster "lock_dlm", "testcluster:testfs" dlm: Using TCP for communications Joined cluster. Now mounting FS... GFS: fsid=testcluster:testfs.4294867295: can't mount journal #4294867295 GFS: fsid=testcluster:testfs.4294867295: there are only 6 journals (0 - 5) I can't find anyone else having issue. Can you figure out why is this happening? Cheers ----- Original Message ---- From: John Ruemker To: linux clustering Sent: Tuesday, April 15, 2008 8:04:46 PM Subject: Re: [Linux-cluster] how can I share a logical volume? When you say you want to share storage with the other 2 nodes, do you mean only one node is physically connected to the storage and exports it to the other 2? Or do you mean that all three nodes are connected to the same storage and they share the device? If the former, gnbd is probably what you want. If the latter, you should see the same devices (/dev/sdX) on each node. If you do not, you have misconfigured your HBA or LUNs. Once they all see the same devices, you should be able to start clvmd and all 3 will see the clustered volume group. John Andrew A. Neuschwander wrote: > Did you setup the node with the storage as a gnbd server and the other > nodes as gnbd clients? I think this is what you want in order for the > nodes to all have block level access to the storage for clvmd and dlm to > run on top of. > > -A > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From punit_j at rediffmail.com Wed Apr 16 12:05:14 2008 From: punit_j at rediffmail.com (punit_j) Date: 16 Apr 2008 12:05:14 -0000 Subject: [Linux-cluster] errors for members joining cluster Message-ID: <20080416120514.7225.qmail@f5mail-237-211.rediffmail.com> ? Hi All, I have created a 2 node active passive cluster. When i am starting the ccsd deamon and cman daemon i get following in the /var/log/messages :- Apr 16 11:46:38 wesnet-store ccsd[2395]: Cluster is not quorate. Refusing connection. Apr 16 11:46:38 wesnet-store ccsd[2395]: Error while processing connect: Connection refused Apr 16 11:46:38 wesnet-store clurgmgrd[2788]: #5: Couldn't connect to ccsd! Apr 16 11:46:38 wesnet-store clurgmgrd[2788]: #8: Couldn't initialize services Apr 16 13:16:13 wesnet-store ccsd[2395]: Unable to connect to cluster infrastructure after 5370 seconds. Apr 16 13:16:43 wesnet-store ccsd[2395]: Unable to connect to cluster infrastructure after 5400 seconds. Apr 16 13:17:13 wesnet-store ccsd[2395]: Unable to connect to cluster infrastructure after 5430 seconds. Apr 16 13:17:44 wesnet-store ccsd[2395]: Unable to connect to cluster infrastructure after 5460 seconds. Apr 16 13:18:14 wesnet-store ccsd[2395]: Unable to connect to cluster infrastructure after 5490 seconds. Apr 16 13:18:44 wesnet-store ccsd[2395]: Unable to connect to cluster infrastructure after 5520 seconds. Moreover the strange thing is it is trying to connect to another cluster in same network. My cluster has a different name as compared to another cluster running in the network. Can anyone point out what can be the issue? Regards, -Punit -------------- next part -------------- An HTML attachment was scrubbed... URL: From denisb+gmane at gmail.com Wed Apr 16 12:19:57 2008 From: denisb+gmane at gmail.com (denis) Date: Wed, 16 Apr 2008 14:19:57 +0200 Subject: [Linux-cluster] GFS in RHCS on RHEL5.1 Message-ID: Hi, I was under the impression that installing Red Hat Cluster Suite with GFS in RHEL5.1 was a "supported" solution, but a colleague informed me that the GFS version in RHEL5.x is currently a technology preview?! Is this correct, and if so, what is my best option for running a shared filesystem between my clusternodes (i have a SAN available)? Regards -- Denis From fog at t.is Wed Apr 16 12:20:43 2008 From: fog at t.is (=?iso-8859-1?Q?Finnur_=D6rn_Gu=F0mundsson_-_TM_Software?=) Date: Wed, 16 Apr 2008 12:20:43 -0000 Subject: [Linux-cluster] GFS in RHCS on RHEL5.1 In-Reply-To: References: Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F77901F9E61F@SKYHQAMX08.klasi.is> Hi, GFS version 1 is supported, however version 2 is currently in a technical preview. Bgrds, Finnur -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of denis Sent: 16. apr?l 2008 12:20 To: linux-cluster at redhat.com Subject: [Linux-cluster] GFS in RHCS on RHEL5.1 Hi, I was under the impression that installing Red Hat Cluster Suite with GFS in RHEL5.1 was a "supported" solution, but a colleague informed me that the GFS version in RHEL5.x is currently a technology preview?! Is this correct, and if so, what is my best option for running a shared filesystem between my clusternodes (i have a SAN available)? Regards -- Denis -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From gordan at bobich.net Wed Apr 16 12:27:48 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 16 Apr 2008 13:27:48 +0100 (BST) Subject: [Linux-cluster] GFS in RHCS on RHEL5.1 In-Reply-To: References: Message-ID: On Wed, 16 Apr 2008, denis wrote: > Hi, > > I was under the impression that installing Red Hat Cluster Suite with > GFS in RHEL5.1 was a "supported" solution, but a colleague informed me > that the GFS version in RHEL5.x is currently a technology preview?! > > Is this correct, and if so, what is my best option for running a shared > filesystem between my clusternodes (i have a SAN available)? Your colleague is semi-misinformed. GFS1 is stable (and has been for years) and available in RHEL5. GFS2 is tech preview. So if you are deploying a cluster right now, do it with GFS1. GFS2 is not yet recommended for production use. Gordan From denisb+gmane at gmail.com Wed Apr 16 13:38:17 2008 From: denisb+gmane at gmail.com (denis) Date: Wed, 16 Apr 2008 15:38:17 +0200 Subject: [Linux-cluster] Re: GFS in RHCS on RHEL5.1 In-Reply-To: References: Message-ID: gordan at bobich.net wrote: >> I was under the impression that installing Red Hat Cluster Suite with >> GFS in RHEL5.1 was a "supported" solution, but a colleague informed me >> that the GFS version in RHEL5.x is currently a technology preview?! > Your colleague is semi-misinformed. > GFS1 is stable (and has been for years) and available in RHEL5. GFS2 is > tech preview. So if you are deploying a cluster right now, do it with > GFS1. GFS2 is not yet recommended for production use. Thanks for the information. My follow-up question is whether GFS (1) tolerates high performance situations (with lots of concurrent writes / read access / high number of files)? Regards -- Denis From gordan at bobich.net Wed Apr 16 13:42:29 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 16 Apr 2008 14:42:29 +0100 (BST) Subject: [Linux-cluster] Re: GFS in RHCS on RHEL5.1 In-Reply-To: References: Message-ID: On Wed, 16 Apr 2008, denis wrote: >>> I was under the impression that installing Red Hat Cluster Suite with >>> GFS in RHEL5.1 was a "supported" solution, but a colleague informed me >>> that the GFS version in RHEL5.x is currently a technology preview?! >> Your colleague is semi-misinformed. >> GFS1 is stable (and has been for years) and available in RHEL5. GFS2 is >> tech preview. So if you are deploying a cluster right now, do it with >> GFS1. GFS2 is not yet recommended for production use. > > Thanks for the information. My follow-up question is whether GFS (1) > tolerates high performance situations (with lots of concurrent writes / > read access / high number of files)? As much as any other similar system does. If your heavy writes with lots of files are all in the same directory, then you will get contention and performance degradation, as writing to a directory (e.g. file creation) requires a directory lock. Something like Maildir with few user accounts won't perform brilliantly. Maildir with lots of user accounts, isn't too bad. There are also tuning parameters you can apply (e.g. lock pruning) that help in such cases. The only way you will know for sure is to try it and see for your particular application. If GFS can't handle it despite optimizations, you could always try OCFS2 or GlusterFS (please do post back with your findings if you get that far, I've not seen a decent real-world comparison recently), but if you are after clusterable scalability, then your application will likely need to be made/modified in such a way that it doesn't trip over issues inherent in clustering (and there will be FS lock contention issues with _ANY_ scaleable cluster FS). Gordan From Harri.Paivaniemi at tietoenator.com Thu Apr 17 05:40:24 2008 From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com) Date: Thu, 17 Apr 2008 08:40:24 +0300 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 References: Message-ID: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> Hi all, Short introduction: My name is Harry, I'm working in Helsinki, Finland and have used RHCS from the beginning, we have currently 7 clusters mainly running MySQL/Oracle databases. I tought I have some kind of knowledge about this clustering software and everything seemed to be ok until version 5. I don't have problems or severe bugging in any of RH4- clusters. But.... Tryed to move --> 5.1 with 64-bit HP Blades. Cluster just won't work or it works but I don't have any kind of trust to it anymore. I have made about 20 different scenarios and there is totally too much problems, couple of those will prevent me to use this anymore. I have created 3 tickets to RH support and it seems to me that they don't know that little what I know. I have had to tell them 2 times to read the f...g manual, because they have spoken directly agains qdisk man-page. They just don't know how it should work... hard to believe but tru. First, I asked how to change cman deadnode_timeout in 5, because /proc doesn't anymore have it and that parameter didn't work on my tests. Support said "you can't tune the timeout at all". I asked, how can I use qdisk if man page says cman's timeout must be > than qdisk eviction timeout.... and told them to read the man-page... finally I found myself the correct parameter "totem token" Second time, they said in my 2-node cluster I made a mistake when I gave 1 vote for the quorum disk... but man-page again tell's to do that and of course it is correct in 2-node cluster.... So, this is my sad history with ver 5. Do you use 64-bit ver 5 and what's your feeling? My problems this time are: 1. 2-node cluster. Can't start only one node to get cluster services up - it hangs in fencing and waits until I start te second node and immediately after that, when both nodes are starting cman, the cluster comes up. So if I have lost one node, I can't get the cluster up, if I have to restart for seome reason the working node. It should work like before (both nodes are down, I start one, it fences another and comes up). Now it just waits... log says: ccsd[25272]: Error while processing connect: Connection refused This is so common error message, that it just tell's nothing to me.... 2. qdisk doesn't work. 2- node cluster. Start it (both nodes at the same time) to get it up. Works ok, qdisk works, heuristic works. Everything works. If I stop cluster daemons on one node, that node can't join to cluster anymore without a complete reboot. It joins, another node says ok, the node itself says ok, quorum is registred and heuristic is up, but the node's quorum-disk stays offline and another node says this node is offline. If I reboot this machine, it joins to cluster ok. 3. Funny thing: heuristic ping didn't work at all in the beginning and support gave me a "ping-script" which make it to work... so this describes quite well how experimental this cluster is nowadays... I have to tell you it is a FACT that basics are ok: fencing works ok in a normal situation, I don't have typos, configs are in sync, everything is ok, but these problems still exists. I have 2 times sent sosreports etc. so RH support. They hava spent 3 weeks and still can't say whats wrong... Just if somebody has something in mind to help... Thanks, -hjp -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4384 bytes Desc: not available URL: From p.elmers at gmx.de Thu Apr 17 07:08:25 2008 From: p.elmers at gmx.de (Peter) Date: Thu, 17 Apr 2008 09:08:25 +0200 Subject: [Linux-cluster] Meaning of Cluster Cycle and timeout problems Message-ID: Hi! In our Cluster we have the following entry in the "messages" logfile: "qdiskd[4314]: qdisk cycle took more than 3 seconds to complete (3.890000)" Theese messages are very frequent. I can not find anything except the source code via google and i am sorry to say that i am not so familar with c to get the point. We also have sometimes a quorum timeout: "kernel: CMAN: Quorum device /dev/sdh timed out" Are theese two messages independent and what is the meaning of the first message? Thanks for reading and answering :) Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2209 bytes Desc: not available URL: From denisb+gmane at gmail.com Thu Apr 17 07:43:42 2008 From: denisb+gmane at gmail.com (denis) Date: Thu, 17 Apr 2008 09:43:42 +0200 Subject: [Linux-cluster] Re: Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> Message-ID: Hi Harri, Please consider using a smart mailclient which can wrap your mails.. It makes things a lot easier. I have a 2 node cluster RHEL5.1 on 64bit (IBM Blades) configured which works well for me (so far). Harri.Paivaniemi at tietoenator.com wrote: > 1. 2-node cluster. Can't start only one node to get cluster services >up - it hangs in fencing and waits until I start te second node and >immediately after that, when both nodes are starting cman, the cluster >comes up. For me fencing would hang for 3-4 minutes before I had properly configured fencing (with manual fallback), after adding manual fallback I no longer have this issue. When a node is fenced / rebooted, every now and then it will fence the other node on restart. And if I shutdown and restart both nodes simultaneously they will fence like this (has happened 2 times on tests) A boots quicker and establishes the cluster B fences A when it starts A fences B when it starts B boots and joins the cluster.. This was before I configured qdisk, and I have not yet tested this behaviour after qdisk was setup. > 2. qdisk doesn't work. 2- node cluster. Start it (both nodes at the qdisk works well for me in a very similar setup. I have done manual fencing tests / hardboot tests, which didn't produce anything like you describe. > 3. Funny thing: heuristic ping didn't work at all in the beginning >and support gave me a "ping-script" which make it to work... so this >describes quite well how experimental this cluster is nowadays... heuristic ping works fine for me out of the box.. Regards -- Denis From harri.paivaniemi at tietoenator.com Thu Apr 17 07:58:22 2008 From: harri.paivaniemi at tietoenator.com (Harri =?ISO-8859-1?Q?P=E4iv=E4niemi?=) Date: Thu, 17 Apr 2008 10:58:22 +0300 Subject: [Linux-cluster] Re: Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> Message-ID: <1208419102.21043.15.camel@hjpsuse.tebit> Sorry, if my email structure was someway messed up - I had to use our webmail (Outlook web access), normally I use Evolution in SuSE... Anyway, thanks for your comments, Denis. -hjp On Thu, 2008-04-17 at 09:43 +0200, denis wrote: > Hi Harri, > > Please consider using a smart mailclient which can wrap your mails.. It > makes things a lot easier. > > I have a 2 node cluster RHEL5.1 on 64bit (IBM Blades) configured which > works well for me (so far). > > Harri.Paivaniemi at tietoenator.com wrote: > > 1. 2-node cluster. Can't start only one node to get cluster services > >up - it hangs in fencing and waits until I start te second node and > >immediately after that, when both nodes are starting cman, the cluster > >comes up. > > For me fencing would hang for 3-4 minutes before I had properly > configured fencing (with manual fallback), after adding manual fallback > I no longer have this issue. > > When a node is fenced / rebooted, every now and then it will fence the > other node on restart. And if I shutdown and restart both nodes > simultaneously they will fence like this (has happened 2 times on tests) > > A boots quicker and establishes the cluster > B fences A when it starts > A fences B when it starts > B boots and joins the cluster.. > > This was before I configured qdisk, and I have not yet tested this > behaviour after qdisk was setup. > > > 2. qdisk doesn't work. 2- node cluster. Start it (both nodes at the > > qdisk works well for me in a very similar setup. I have done manual > fencing tests / hardboot tests, which didn't produce anything like you > describe. > > > 3. Funny thing: heuristic ping didn't work at all in the beginning > >and support gave me a "ping-script" which make it to work... so this > >describes quite well how experimental this cluster is nowadays... > > heuristic ping works fine for me out of the box.. > > Regards > -- > Denis > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From gordan at bobich.net Thu Apr 17 08:17:30 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 17 Apr 2008 09:17:30 +0100 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> Message-ID: <4807079A.5060909@bobich.net> Harri.Paivaniemi at tietoenator.com wrote: > So, this is my sad history with ver 5. Do you use 64-bit ver 5 and what's your feeling? I only started using it with v5, and I have to say that I haven't had any real problems. Some of my clusters have been 64-bit, some 32-bit, and I haven't seen any differences yet. > My problems this time are: > > 1. 2-node cluster. Can't start only one node to get cluster services up - it hangs in fencing and waits until I start te second node and immediately after that, when both nodes are starting cman, the cluster comes up. So if I have lost one node, I can't get the cluster up, if I have to restart for seome reason the working node. It should work like before (both nodes are down, I start one, it fences another and comes up). Now it just waits... log says: > > ccsd[25272]: Error while processing connect: Connection refused > > This is so common error message, that it just tell's nothing to me.... I have seen similar error messages before, and it has usually been caused by either the node names/interfaces/IPs not being listed correctly in /etc/hosts file, or iptables firewalling rules blocking communication between the nodes. > 2. qdisk doesn't work. 2- node cluster. Start it (both nodes at the same time) to get it up. Works ok, qdisk works, heuristic works. Everything works. If I stop cluster daemons on one node, that node can't join to cluster anymore without a complete reboot. It joins, another node says ok, the node itself says ok, quorum is registred and heuristic is up, but the node's quorum-disk stays offline and another node says this node is offline. If I reboot this machine, it joins to cluster ok. I believe it's supposed to work that way. When a node fails it needs to be fully restarted before it is allowed back into the cluster. I'm sure this has been mentioned on the list recently. > 3. Funny thing: heuristic ping didn't work at all in the beginning and support gave me a "ping-script" which make it to work... so this describes quite well how experimental this cluster is nowadays... > > I have to tell you it is a FACT that basics are ok: fencing works ok in a normal situation, I don't have typos, configs are in sync, everything is ok, but these problems still exists. I've been in similar situations before, but in the end it always turned out to be me doing something silly (see above re: host files and iptables as examples). > I have 2 times sent sosreports etc. so RH support. They hava spent 3 weeks and still can't say whats wrong... Sadly, that seems to be the quality of commercial support from any vendor. Support nowdays seems to have only one purpose - managerial back-covering exercise so they can pass the buck. I have always found that community support is several orders of magnitude better than commercial support in terms of both response speed and quality. Gordan From harri.paivaniemi at tietoenator.com Thu Apr 17 08:30:30 2008 From: harri.paivaniemi at tietoenator.com (Harri =?ISO-8859-1?Q?P=E4iv=E4niemi?=) Date: Thu, 17 Apr 2008 11:30:30 +0300 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <4807079A.5060909@bobich.net> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> Message-ID: <1208421030.21043.24.camel@hjpsuse.tebit> No, I strongly believe it should not work that way. To my mind, it should work like this: - 2 nodes up'n running, everything ok - shutdown cluster daemons on node b - node b tells node a "I'm going administrative down", node a is decreasing cluster votes from 3 --> 2 - node a is happy, no fencing - start node b's cluster daemons - joins to cluster normally - gains quorum device normally, cluster votes back --> 3 Of course it's different if node b fails, but this is not failing, it's administrative shutdown and node a is informed. If I halt node b, it's fenced ok by node a, as it should be, it reboots and joins to cluster normally. -hjp On Thu, 2008-04-17 at 09:17 +0100, Gordan Bobic wrote: > > 2. qdisk doesn't work. 2- node cluster. Start it (both nodes at the > same time) to get it up. Works ok, qdisk works, heuristic works. > Everything works. If I stop cluster daemons on one node, that node > can't join to cluster anymore without a complete reboot. It joins, > another node says ok, the node itself says ok, quorum is registred and > heuristic is up, but the node's quorum-disk stays offline and another > node says this node is offline. If I reboot this machine, it joins to > cluster ok. > > I believe it's supposed to work that way. When a node fails it needs > to > be fully restarted before it is allowed back into the cluster. I'm > sure > this has been mentioned on the list recently. From johannes.russek at io-consulting.net Thu Apr 17 09:18:51 2008 From: johannes.russek at io-consulting.net (jr) Date: Thu, 17 Apr 2008 11:18:51 +0200 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <4807079A.5060909@bobich.net> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> Message-ID: <1208423931.27774.2.camel@admc.win-rar.local> Am Donnerstag, den 17.04.2008, 09:17 +0100 schrieb Gordan Bobic: > > 1. 2-node cluster. Can't start only one node to get cluster services up - it hangs in fencing and waits until I start te second node and immediately after that, when both nodes are starting cman, the cluster comes up. So if I have lost one node, I can't get the cluster up, if I have to restart for seome reason the working node. It should work like before (both nodes are down, I start one, it fences another and comes up). Now it just waits... log says: > > > > ccsd[25272]: Error while processing connect: Connection refused > > > > This is so common error message, that it just tell's nothing to me.... > > I have seen similar error messages before, and it has usually been > caused by either the node names/interfaces/IPs not being listed > correctly in /etc/hosts file, or iptables firewalling rules blocking > communication between the nodes. or if the cluster isn't quorate, i believe cman refuses to accept any connections. regards, johannes From harri.paivaniemi at tietoenator.com Thu Apr 17 09:28:47 2008 From: harri.paivaniemi at tietoenator.com (Harri =?ISO-8859-1?Q?P=E4iv=E4niemi?=) Date: Thu, 17 Apr 2008 12:28:47 +0300 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208423931.27774.2.camel@admc.win-rar.local> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> Message-ID: <1208424527.21043.33.camel@hjpsuse.tebit> Well, I don't have any mistakes with firewalls, hosts, names, ip's etc. This is a fact. Communication itself works. Maby it sounds strange when I say I don't have mistakes, but this time it's true ;) In this case cluster should gain quorum and start running services on node a (it has 2 votes (node-vote + qdisk-vote). It should fence node b first, because it doesn't know where it is. So this behaviour is wrong. -hjp On Thu, 2008-04-17 at 11:18 +0200, jr wrote: > Am Donnerstag, den 17.04.2008, 09:17 +0100 schrieb Gordan Bobic: > > > > 1. 2-node cluster. Can't start only one node to get cluster services up - it hangs in fencing and waits until I start te second node and immediately after that, when both nodes are starting cman, the cluster comes up. So if I have lost one node, I can't get the cluster up, if I have to restart for seome reason the working node. It should work like before (both nodes are down, I start one, it fences another and comes up). Now it just waits... log says: > > > > > > ccsd[25272]: Error while processing connect: Connection refused > > > > > > This is so common error message, that it just tell's nothing to me.... > > > > I have seen similar error messages before, and it has usually been > > caused by either the node names/interfaces/IPs not being listed > > correctly in /etc/hosts file, or iptables firewalling rules blocking > > communication between the nodes. > > or if the cluster isn't quorate, i believe cman refuses to accept any > connections. > > regards, > johannes > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From johannes.russek at io-consulting.net Thu Apr 17 09:32:43 2008 From: johannes.russek at io-consulting.net (jr) Date: Thu, 17 Apr 2008 11:32:43 +0200 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208424527.21043.33.camel@hjpsuse.tebit> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> Message-ID: <1208424763.27774.6.camel@admc.win-rar.local> Am Donnerstag, den 17.04.2008, 12:28 +0300 schrieb Harri P?iv?niemi: > Well, > > I don't have any mistakes with firewalls, hosts, names, ip's etc. This > is a fact. Communication itself works. Maby it sounds strange when I say > I don't have mistakes, but this time it's true ;) > > In this case cluster should gain quorum and start running services on > node a (it has 2 votes (node-vote + qdisk-vote). > > It should fence node b first, because it doesn't know where it is. > > So this behaviour is wrong. > > -hjp i think something is wrong here, like the expected votes or similiar. if the one node had 2 votes and those were the expected votes, it would maintain quorum and thus fence the other node. that connection refused error seems to say that that node doesn't have the quorum nonetheless. can you confirm that? (clustat should show you if that node is quorate or not) regards, johannes From Laurent.WEISLO at ext.ec.europa.eu Thu Apr 17 09:38:52 2008 From: Laurent.WEISLO at ext.ec.europa.eu (Laurent.WEISLO at ext.ec.europa.eu) Date: Thu, 17 Apr 2008 11:38:52 +0200 Subject: [Linux-cluster] How to separate inter-cluster/public network traffic Message-ID: <867B2D3FDCEAE947AE9DEEF2D5BD4F0601E35ED4@S-DC-EXM22.net1.cec.eu.int> Hi, I'm running RedHat 5.2 Beta (Tikanga) and I'm trying to achieve this behavior: - 2 NICs (eth0, eth1) for one bond0 device intended for public LAN XXX.XXX.XXX.XXX traffic - 2 NICs (eth2, eth3) for one bond1 device intended for inter-cluster LAN YYY.YYY.YYY.YYY traffic - NodeA and NodeB IP addresses are in LAN XXX.XXX.XXX.XXX In cluster.conf: Unfortunately all the cluster traffic is bound to bond0: [root at NodeA]# ip maddr show bond0 ... inet 239.192.NN.NN inet 224.0.0.1 ... [root at NodeA]# ip maddr show bond1 ... inet 224.0.0.1 ... Is it possible to do it like that (all the clusters have 2 VLAN with bonding each) ? If not, should I put NodeA and NodeB into the LAN YYY.YYY.YYY.YYY in cluster.conf ? Thx for your help ! -------------- next part -------------- An HTML attachment was scrubbed... URL: From harri.paivaniemi at tietoenator.com Thu Apr 17 09:58:10 2008 From: harri.paivaniemi at tietoenator.com (Harri =?ISO-8859-1?Q?P=E4iv=E4niemi?=) Date: Thu, 17 Apr 2008 12:58:10 +0300 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208424763.27774.6.camel@admc.win-rar.local> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> Message-ID: <1208426290.21043.48.camel@hjpsuse.tebit> Yes, something is wrong. I made little bit more research: - stop cluster daemons on both nodes (node a & node b) - start cluster on node a It hangs 5 minutes on cman's fencing part like this: Starting cluster: Loading modules... done Mounting configfs... done Starting ccsd... done Starting cman... done Starting daemons... done Starting fencing... ... and in process list there is this: /sbin/fence_tool -w -t 300 join ... so thats the 5 minutes. Question is: why it waits there 54 minutes? - after 5 minutes waiting, node a says: Starting fencing... failed [FAILED] Starting the Quorum Disk Daemon: [ OK ] Starting Cluster Service Manager: [ OK ] ... and then it loads qdiskd and after a while it has 2 votes and it starts services normally and voila, I have a running cluster with one node: Node Sts Inc Joined Name 0 M 0 2008-04-17 12:51:01 /dev/sda 1 M 1356 2008-04-17 12:45:44 areenasql1 2 X 0 areenasql2 [root at areenasql1 ~]# cman_tool status Version: 6.0.1 Config Version: 4 Cluster Name: areena_sql Cluster Id: 39330 Cluster Member: Yes Cluster Generation: 1356 Membership state: Cluster-Member Nodes: 1 Expected votes: 3 Total votes: 2 Quorum: 2 Active subsystems: 8 Flags: Ports Bound: 0 177 Node name: areenasql1 Node ID: 1 Multicast addresses: 239.192.153.60 Node addresses: 10.1.1.178 But log says nothing about that failed fencing. Fencing is configured correctly, I use HP ILO and everything is ok. Fencing works in running cluster ok, both nodes can fence eachother. Node a should fence node b in this situation and maby it's trying to do it somehow, but it logs nothing. It should log at least "fence failed etc." if it's unable to fence node b... And what's more important, if we think node a can't fence node b in this startup situation, it should NOT start services but it starts.... -hjp On Thu, 2008-04-17 at 11:32 +0200, jr wrote: > Am Donnerstag, den 17.04.2008, 12:28 +0300 schrieb Harri P?iv?niemi: > > Well, > > > > I don't have any mistakes with firewalls, hosts, names, ip's etc. This > > is a fact. Communication itself works. Maby it sounds strange when I say > > I don't have mistakes, but this time it's true ;) > > > > In this case cluster should gain quorum and start running services on > > node a (it has 2 votes (node-vote + qdisk-vote). > > > > It should fence node b first, because it doesn't know where it is. > > > > So this behaviour is wrong. > > > > -hjp > > i think something is wrong here, like the expected votes or similiar. if > the one node had 2 votes and those were the expected votes, it would > maintain quorum and thus fence the other node. that connection refused > error seems to say that that node doesn't have the quorum nonetheless. > can you confirm that? (clustat should show you if that node is quorate > or not) > regards, > johannes > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From johannes.russek at io-consulting.net Thu Apr 17 10:12:57 2008 From: johannes.russek at io-consulting.net (jr) Date: Thu, 17 Apr 2008 12:12:57 +0200 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208426290.21043.48.camel@hjpsuse.tebit> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> Message-ID: <1208427177.27774.8.camel@admc.win-rar.local> do you mind sending your cluster.conf? johannes From harri.paivaniemi at tietoenator.com Thu Apr 17 10:27:45 2008 From: harri.paivaniemi at tietoenator.com (Harri =?ISO-8859-1?Q?P=E4iv=E4niemi?=) Date: Thu, 17 Apr 2008 13:27:45 +0300 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208427177.27774.8.camel@admc.win-rar.local> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> Message-ID: <1208428065.21043.60.camel@hjpsuse.tebit> Yes, Cluster.conf attached. I just resolved 1 thing: When node a & b are down (cluster daemons) and I start node a, it hangs 5 minutes in fencing becouse becouse... man fence_tool says: ""Before joining or leaving the fence domain, fence_tool waits for the cluster be in a quorate state"" And in qdisk man- page it's said: ""CMAN must be running before the qdisk program can operate in full capacity. If CMAN is not running, qdisk will wait for it." I started in this order: cman-qdiskd-rgmanager". In this case it hangs because fence is waiting cluster to be quorate and it's not gonna be because qdisk is not yet running ;) Jihaa - so one problem solved. No I can start cluster node at a time. The 2nd problem that still exists is: When node a and b are running and everything is ok. I stop node b's cluster daemons. when I start node b again, this situation stays forever: ---------------- node a - clustat Member Status: Quorate Member Name ID Status ------ ---- ---- ------ areenasql1 1 Online, Local, rgmanager areenasql2 2 Offline /dev/sda 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- service:areena areenasql1 started ------------------- node b - clustat Member Status: Quorate Member Name ID Status ------ ---- ---- ------ areenasql1 1 Online, rgmanager areenasql2 2 Online, Local, rgmanager /dev/sda 0 Offline, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- service:areena areenasql1 started So node b's quorum disk is offline, log says it's registred ok and heuristic is UP... node a sees node b as offline. If I reboot node b, it works ok and joins ok... Both nodes sees: Nodes: 2 Expected votes: 3 Total votes: 2 Quorum: 2 -hjp On Thu, 2008-04-17 at 12:12 +0200, jr wrote: > do you mind sending your cluster.conf? > > johannes > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.conf Type: application/xml Size: 2638 bytes Desc: not available URL: From johannes.russek at io-consulting.net Thu Apr 17 10:57:34 2008 From: johannes.russek at io-consulting.net (jr) Date: Thu, 17 Apr 2008 12:57:34 +0200 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208428065.21043.60.camel@hjpsuse.tebit> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> <1208428065.21043.60.camel@hjpsuse.tebit> Message-ID: <1208429854.27774.19.camel@admc.win-rar.local> indeed, that seems like a good idea to start qdiskd before cman if qdiskd simply waits for cman to come up. RHEL ships with the chkconfig configuration of S21 for cman and S22 for qdiskd. question to the list: shouldn't this be changed? johannes Am Donnerstag, den 17.04.2008, 13:27 +0300 schrieb Harri P?iv?niemi: > Yes, > > Cluster.conf attached. > > > I just resolved 1 thing: > > When node a & b are down (cluster daemons) and I start node a, it hangs > 5 minutes in fencing becouse becouse... > > > man fence_tool says: > > ""Before joining or leaving the fence domain, fence_tool waits for the > cluster be in a quorate state"" > > And in qdisk man- page it's said: > > ""CMAN must be running before the qdisk program can operate in full > capacity. If CMAN is not running, qdisk will wait for it." > > I started in this order: cman-qdiskd-rgmanager". In this case it hangs > because fence is waiting cluster to be quorate and it's not gonna be > because qdisk is not yet running ;) > > Jihaa - so one problem solved. No I can start cluster node at a time. > > > The 2nd problem that still exists is: > > When node a and b are running and everything is ok. I stop node b's > cluster daemons. when I start node b again, this situation stays > forever: > > ---------------- > node a - clustat > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, Local, rgmanager > areenasql2 2 Offline > /dev/sda 0 Online, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > ------------------- > > node b - clustat > > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, rgmanager > areenasql2 2 Online, Local, rgmanager > /dev/sda 0 Offline, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > > So node b's quorum disk is offline, log says it's registred ok and > heuristic is UP... node a sees node b as offline. If I reboot node b, it > works ok and joins ok... > > Both nodes sees: > > Nodes: 2 > Expected votes: 3 > Total votes: 2 > Quorum: 2 > > > > -hjp > > > > > > > > On Thu, 2008-04-17 at 12:12 +0200, jr wrote: > > do you mind sending your cluster.conf? > > > > johannes > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From gordan at bobich.net Thu Apr 17 11:21:02 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 17 Apr 2008 12:21:02 +0100 (BST) Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208428065.21043.60.camel@hjpsuse.tebit> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> <1208428065.21043.60.camel@hjpsuse.tebit> Message-ID: On Thu, 17 Apr 2008, Harri P?iv?niemi wrote: > I just resolved 1 thing: > > When node a & b are down (cluster daemons) and I start node a, it hangs > 5 minutes in fencing becouse becouse... > > man fence_tool says: > > ""Before joining or leaving the fence domain, fence_tool waits for the > cluster be in a quorate state"" > > And in qdisk man- page it's said: > > ""CMAN must be running before the qdisk program can operate in full > capacity. If CMAN is not running, qdisk will wait for it." > > I started in this order: cman-qdiskd-rgmanager". In this case it hangs > because fence is waiting cluster to be quorate and it's not gonna be > because qdisk is not yet running ;) > > Jihaa - so one problem solved. No I can start cluster node at a time. So - your config was not correct after all? ;) Gordan From harri.paivaniemi at tietoenator.com Thu Apr 17 11:41:21 2008 From: harri.paivaniemi at tietoenator.com (Harri =?ISO-8859-1?Q?P=E4iv=E4niemi?=) Date: Thu, 17 Apr 2008 14:41:21 +0300 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> <1208428065.21043.60.camel@hjpsuse.tebit> Message-ID: <1208432481.21043.75.camel@hjpsuse.tebit> Weeeel, In this case my config was right after all - those 2 man- pages are not in sync from the logical point of view... and maby my own logical competence was also part of this confusion ;) But still, I just can't solve this: - 2 nodes up'n running, works totally ok - stop another nodes cman, qdiskd and rgmanager - start those again (qdiskd-cman-rgmanager OR cman-qdiskd-rgmanager OR cman-rgmanager-qdiskd) I'll end up to situation where - restarted node sees quorum device offline and another node online. It has 2 votes so it is quorate - another node sees restarted node offline. It also has 2 votes so it is quorate Node reboot solves the problem. -hjp On Thu, 2008-04-17 at 12:21 +0100, gordan at bobich.net wrote: > On Thu, 17 Apr 2008, Harri P?iv?niemi wrote: > > > I just resolved 1 thing: > > > > When node a & b are down (cluster daemons) and I start node a, it hangs > > 5 minutes in fencing becouse becouse... > > > > man fence_tool says: > > > > ""Before joining or leaving the fence domain, fence_tool waits for the > > cluster be in a quorate state"" > > > > And in qdisk man- page it's said: > > > > ""CMAN must be running before the qdisk program can operate in full > > capacity. If CMAN is not running, qdisk will wait for it." > > > > I started in this order: cman-qdiskd-rgmanager". In this case it hangs > > because fence is waiting cluster to be quorate and it's not gonna be > > because qdisk is not yet running ;) > > > > Jihaa - so one problem solved. No I can start cluster node at a time. > > So - your config was not correct after all? ;) > > Gordan > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From gordan at bobich.net Thu Apr 17 11:50:10 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 17 Apr 2008 12:50:10 +0100 (BST) Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208432481.21043.75.camel@hjpsuse.tebit> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> <1208428065.21043.60.camel@hjpsuse.tebit> <1208432481.21043.75.camel@hjpsuse.tebit> Message-ID: On Thu, 17 Apr 2008, Harri P?iv?niemi wrote: > But still, I just can't solve this: > > - 2 nodes up'n running, works totally ok > - stop another nodes cman, qdiskd and rgmanager > - start those again (qdiskd-cman-rgmanager OR cman-qdiskd-rgmanager OR > cman-rgmanager-qdiskd) > > I'll end up to situation where > > - restarted node sees quorum device offline and another node online. It > has 2 votes so it is quorate > > - another node sees restarted node offline. It also has 2 votes so it is > quorate > > Node reboot solves the problem. I'm sure I have seen a post on this list recently explaining why this (or a similar condition) is normal and expected. But for some reason I cannot seem to find it... Gordan From alacey at brynmawr.edu Thu Apr 17 15:47:05 2008 From: alacey at brynmawr.edu (Andrew Lacey) Date: Thu, 17 Apr 2008 11:47:05 -0400 (EDT) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? Message-ID: <1032.165.106.200.207.1208447225.squirrel@webmail> I am doing some testing on a 2-node, active/standby RHEL 4 cluster with non-GFS shared storage. I am using HP iLO for fencing. I don't have a quorum disk set up. Both cluster nodes are connected to the same switch, and that network path is used for cluster communication as well as general network communication (including access to iLO). I've found that when the switch goes down and comes back up, the result is not desirable. As soon as the switch loses power, each node starts trying to fence the other. Since the iLO is not reachable, this is unsuccessful, but the nodes keep retrying the fence. When the switch comes back online, the "OK Corral" scenario takes place -- both nodes fence each other simultaneously and bring down the cluster. I have seen some references to the concept of IP-based tie-breakers on a Red Hat cluster, but I'm not sure how to set this up. What I would like is a configuration whereby a node that cannot ping the switch will just sit there in its current state and not attempt to fence the other node. Fencing would only occur when a node can reach the switch but cannot reach the other node. Is this something that can be done? Can someone direct me to documentation? I have a ticket in with Red Hat on this same question, so we'll see who answers first :-) Thanks, -Andrew L From gordan at bobich.net Thu Apr 17 15:55:43 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 17 Apr 2008 16:55:43 +0100 (BST) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <1032.165.106.200.207.1208447225.squirrel@webmail> References: <1032.165.106.200.207.1208447225.squirrel@webmail> Message-ID: On Thu, 17 Apr 2008, Andrew Lacey wrote: > I am doing some testing on a 2-node, active/standby RHEL 4 cluster with > non-GFS shared storage. I am using HP iLO for fencing. I don't have a > quorum disk set up. Both cluster nodes are connected to the same switch, > and that network path is used for cluster communication as well as general > network communication (including access to iLO). I've found that when the > switch goes down and comes back up, the result is not desirable. As soon > as the switch loses power, each node starts trying to fence the other. > Since the iLO is not reachable, this is unsuccessful, but the nodes keep > retrying the fence. When the switch comes back online, the "OK Corral" > scenario takes place -- both nodes fence each other simultaneously and > bring down the cluster. I had a similar issue, but the solution I went for is doctoring the fencing agent to put in a delay based on node's priority in to the fencing daemon. That way the nodes wouldn't try to fence simultaneously, but in a staggered fashion. If you have a spare NIC, and the nodes are next to each other, you could make them use a cross-over cable for their cluster communication, so they would notice that they are both still up even when the switch dies. That's what I do. Gordan From Harri.Paivaniemi at tietoenator.com Thu Apr 17 16:01:46 2008 From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com) Date: Thu, 17 Apr 2008 19:01:46 +0300 Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? References: <1032.165.106.200.207.1208447225.squirrel@webmail> Message-ID: <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> What Gordan said is true, but you could also just tune deadnode_timeout to be different on both nodes: this results the behaviour Gordan told - the node that has smaller deadnode_timeout would fence first. -hjp -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Andrew Lacey Sent: Thu 4/17/2008 18:47 To: Linux-cluster at redhat.com Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? I am doing some testing on a 2-node, active/standby RHEL 4 cluster with non-GFS shared storage. I am using HP iLO for fencing. I don't have a quorum disk set up. Both cluster nodes are connected to the same switch, and that network path is used for cluster communication as well as general network communication (including access to iLO). I've found that when the switch goes down and comes back up, the result is not desirable. As soon as the switch loses power, each node starts trying to fence the other. Since the iLO is not reachable, this is unsuccessful, but the nodes keep retrying the fence. When the switch comes back online, the "OK Corral" scenario takes place -- both nodes fence each other simultaneously and bring down the cluster. I have seen some references to the concept of IP-based tie-breakers on a Red Hat cluster, but I'm not sure how to set this up. What I would like is a configuration whereby a node that cannot ping the switch will just sit there in its current state and not attempt to fence the other node. Fencing would only occur when a node can reach the switch but cannot reach the other node. Is this something that can be done? Can someone direct me to documentation? I have a ticket in with Red Hat on this same question, so we'll see who answers first :-) Thanks, -Andrew L -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3678 bytes Desc: not available URL: From j.buzzard at dundee.ac.uk Thu Apr 17 16:17:05 2008 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Thu, 17 Apr 2008 17:17:05 +0100 Subject: [Linux-cluster] Fencing Driver API Requirements In-Reply-To: <4803A6C4.20706@redhat.com> References: <4803A6C4.20706@redhat.com> Message-ID: <1208449025.31915.55.camel@localhost.lifesci.dundee.ac.uk> On Mon, 2008-04-14 at 20:47 +0200, Marek 'marx' Grac wrote: > Hi, > > gordan at bobich.net wrote: > > > > I remember that this was mentioned several times in the last few > > months, but has any documentation been put together on the API that > > the fencing drivers are supposed to cover? > > > > I'm looking into writing a fencing driver based on disabling switch > > ports on a managed 3com switch via the telnet interface, and I'd like > > to make sure > > that it conforms to any speciffic requirements that might exist. If > > someone could point me at the relevant URL, that would be most > > appreciated. > > There is a new python module in the git (master branch / > cluster/gence/agents/lib/fencing.py) that should contain everything you > should need to write a fence agent. This module was used to built > several agents (they are just in the git tree) eg. apc/apc.py, > drac/drac5.py, wti/wti.py. If you will find any problem with fencing.py, > let me know and I will try to fix it. > The issue is that with such a critical component of a cluster (if the fencing is not right bad things will happen) that in order to write a new fencing agent one has to start reverse engineering from source to work out what you need to do. This is incredibly bad practice, and is bound to lead to improperly implemented fencing agents that then lead to bad things happening on clusters with these fencing agents. There a loads of potential fencing devices out there that could be supported, that are currently not. From my perspective trying to implement a fencing agent for Alert On Lan 2, it was easier to reverse engineer the magic packets of death using tcpdump and IDA pro as well as implementing a C based Linux command tool to generate them, than it has been to write a functioning fencing agent. It would take a couple of hours tops for someone to write a spec for what a fencing agent needs to do. JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH From gordan at bobich.net Thu Apr 17 16:42:35 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 17 Apr 2008 17:42:35 +0100 (BST) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> Message-ID: On Thu, 17 Apr 2008, Harri.Paivaniemi at tietoenator.com wrote: > What Gordan said is true, > > but you could also just tune deadnode_timeout to be different on both > nodes: this results the behaviour Gordan told - the node that has > smaller deadnode_timeout would fence first. Now, why didn't I know about this before? Much neater than my hack of putting a different sleep in the two fencing agents. :-) Thanks. Gordan From alacey at brynmawr.edu Thu Apr 17 16:44:08 2008 From: alacey at brynmawr.edu (Andrew Lacey) Date: Thu, 17 Apr 2008 12:44:08 -0400 (EDT) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> Message-ID: <1122.165.106.200.207.1208450648.squirrel@webmail> > but you could also just tune deadnode_timeout to be different on both > nodes: this results the behaviour Gordan told - the node that has smaller > deadnode_timeout would fence first. Would this work in a situation where the switch was down for a few minutes? Suppose the deadnode_timeout is 30 seconds on one node and 60 seconds on the other. So, after 60 seconds of switch downtime, both nodes would be trying to fence. If the switch comes up after being down for 5 minutes, they would still immediately fence each other. Or am I not thinking about this correctly? -Andrew L From gordan at bobich.net Thu Apr 17 16:46:41 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 17 Apr 2008 17:46:41 +0100 (BST) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <1122.165.106.200.207.1208450648.squirrel@webmail> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> <1122.165.106.200.207.1208450648.squirrel@webmail> Message-ID: On Thu, 17 Apr 2008, Andrew Lacey wrote: >> but you could also just tune deadnode_timeout to be different on both >> nodes: this results the behaviour Gordan told - the node that has smaller >> deadnode_timeout would fence first. > > Would this work in a situation where the switch was down for a few > minutes? Suppose the deadnode_timeout is 30 seconds on one node and 60 > seconds on the other. So, after 60 seconds of switch downtime, both nodes > would be trying to fence. If the switch comes up after being down for 5 > minutes, they would still immediately fence each other. Or am I not > thinking about this correctly? There's an argument that if your switch is down for 30 minutes, you have bigger problems. If you have a 30 minute switch outage, the chances are that you can live with the node power-up time on top of that. Gordan From alacey at brynmawr.edu Thu Apr 17 16:49:28 2008 From: alacey at brynmawr.edu (Andrew Lacey) Date: Thu, 17 Apr 2008 12:49:28 -0400 (EDT) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: References: <1032.165.106.200.207.1208447225.squirrel@webmail> Message-ID: <1123.165.106.200.207.1208450968.squirrel@webmail> > If you have a spare NIC, and the nodes are next to each other, you could > make them use a cross-over cable for their cluster communication, so they > would notice that they are both still up even when the switch dies. That's > what I do. I had considered this option but I haven't tried it. One thing I was wondering is how the cluster knows which network interface should get the cluster service IP address in that situation. Right now, I don't have anything in my cluster.conf that specifies this, but it just seems to work. I figured that if I tried to use a crossover cable, what I would need to do is use /etc/hosts to create hostnames on this little private network (consisting of just the 2 nodes connected by a cable) and use those hostnames as the node hostnames in cluster.conf. If I did that, would the cluster services try to assign the cluster service IP to the interface with the crossover cable (when obviously what I want is to assign it to the outward-facing interface)? -Andrew L From alacey at brynmawr.edu Thu Apr 17 16:55:06 2008 From: alacey at brynmawr.edu (Andrew Lacey) Date: Thu, 17 Apr 2008 12:55:06 -0400 (EDT) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> <1122.165.106.200.207.1208450648.squirrel@webmail> Message-ID: <1127.165.106.200.207.1208451306.squirrel@webmail> > There's an argument that if your switch is down for 30 minutes, you > have bigger problems. If you have a 30 minute switch outage, the chances > are that you can live with the node power-up time on top of that. Point taken, but the problem is that if there is a switch outage and the nodes kill each other, then somebody has to come in, power the nodes back on and make sure everything comes up OK. It would be much easier if the nodes would just detect that the switch is down and wait patiently without doing anything (since there is really nothing wrong with the nodes at all, and if they just wait for the switch to come back, everything will be fine.) We do have a history of flaky network here because we're a college...we have a lot of machines on campus that we don't control (student-owned) and we get weird traffic, rogue machines, etc. more frequently than a locked-down corporate environment. I want to make sure that one of those network events doesn't needlessly bring down our mail service, which is what will be running on this cluster. -Andrew L From Harri.Paivaniemi at tietoenator.com Thu Apr 17 17:03:52 2008 From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com) Date: Thu, 17 Apr 2008 20:03:52 +0300 Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? References: <1032.165.106.200.207.1208447225.squirrel@webmail><36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com><1122.165.106.200.207.1208450648.squirrel@webmail> <1127.165.106.200.207.1208451306.squirrel@webmail> Message-ID: <36F4E74FA8263744A6B016E6A461EFF603317E20@dino.eu.tieto.com> If you just want to have a cluster where client network can be down infinitely without cluster to take actions, you have to run cluster heartbeat via cross-cable and deny cluster's link monitoring in client interface. Or then start using qdisk and build heuristics. Note, that in RHCS 5 deadnode_timeout doesn't exist anymore in /proc. It's totem token there, but havn't checked where it lives in /proc or maby it's in /sys nowadays. -hjp -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Andrew Lacey Sent: Thu 4/17/2008 19:55 To: linux clustering Subject: RE: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? > There's an argument that if your switch is down for 30 minutes, you > have bigger problems. If you have a 30 minute switch outage, the chances > are that you can live with the node power-up time on top of that. Point taken, but the problem is that if there is a switch outage and the nodes kill each other, then somebody has to come in, power the nodes back on and make sure everything comes up OK. It would be much easier if the nodes would just detect that the switch is down and wait patiently without doing anything (since there is really nothing wrong with the nodes at all, and if they just wait for the switch to come back, everything will be fine.) We do have a history of flaky network here because we're a college...we have a lot of machines on campus that we don't control (student-owned) and we get weird traffic, rogue machines, etc. more frequently than a locked-down corporate environment. I want to make sure that one of those network events doesn't needlessly bring down our mail service, which is what will be running on this cluster. -Andrew L -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3922 bytes Desc: not available URL: From Bennie_R_Thomas at raytheon.com Thu Apr 17 17:03:43 2008 From: Bennie_R_Thomas at raytheon.com (Bennie Thomas) Date: Thu, 17 Apr 2008 12:03:43 -0500 Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <1127.165.106.200.207.1208451306.squirrel@webmail> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> <1122.165.106.200.207.1208450648.squirrel@webmail> <1127.165.106.200.207.1208451306.squirrel@webmail> Message-ID: <480782EF.6070803@raytheon.com> Turn port security on to rid rogue machines. As Gordan suggested use a private interface for the cluster communications and that will resolve the issue with the switch going down. If you use the point-to-point nic then you will have to reconfigure your cluster to use the new nodenames assigned to the private lan. Andrew Lacey wrote: >> There's an argument that if your switch is down for 30 minutes, you >> have bigger problems. If you have a 30 minute switch outage, the chances >> are that you can live with the node power-up time on top of that. >> > > Point taken, but the problem is that if there is a switch outage and the > nodes kill each other, then somebody has to come in, power the nodes back > on and make sure everything comes up OK. It would be much easier if the > nodes would just detect that the switch is down and wait patiently without > doing anything (since there is really nothing wrong with the nodes at all, > and if they just wait for the switch to come back, everything will be > fine.) > > We do have a history of flaky network here because we're a college...we > have a lot of machines on campus that we don't control (student-owned) and > we get weird traffic, rogue machines, etc. more frequently than a > locked-down corporate environment. I want to make sure that one of those > network events doesn't needlessly bring down our mail service, which is > what will be running on this cluster. > > -Andrew L > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From gordan at bobich.net Thu Apr 17 17:06:13 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 17 Apr 2008 18:06:13 +0100 (BST) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <1123.165.106.200.207.1208450968.squirrel@webmail> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <1123.165.106.200.207.1208450968.squirrel@webmail> Message-ID: On Thu, 17 Apr 2008, Andrew Lacey wrote: >> If you have a spare NIC, and the nodes are next to each other, you could >> make them use a cross-over cable for their cluster communication, so they >> would notice that they are both still up even when the switch dies. That's >> what I do. > > I had considered this option but I haven't tried it. One thing I was > wondering is how the cluster knows which network interface should get the > cluster service IP address in that situation. Whichever interface has the IPs on the right subnet. Your public interface has the public/fail-over IPs. The private cluster interface has a pair of private IPs on a network of their own. No resource groups should be assigned to that interface. It's there just for intra-cluster communication (e.g. dlm, san/drbd, etc.). > Right now, I don't have > anything in my cluster.conf that specifies this, but it just seems to > work. I figured that if I tried to use a crossover cable, what I would > need to do is use /etc/hosts to create hostnames on this little private > network (consisting of just the 2 nodes connected by a cable) and use > those hostnames as the node hostnames in cluster.conf. That works. > If I did that, > would the cluster services try to assign the cluster service IP to the > interface with the crossover cable (when obviously what I want is to > assign it to the outward-facing interface)? It will assign the IPs to whatever interface already has an IP on that subnet. i.e. if your private cluster interface (crossover one) is 192.168.0.0/16 and your public interface is 10.0.0.0/8, you will have a resource group with IPs on the 10.0.0.0/8 subnet, not on the 192.168.0.0/16 subnet. You will probably want to add additional monitoring against switch port failures here, as otherwise if the switch port of the master node fails (it does happen, I've seen many a switch with just 1-2 dead ports), the backup will not notice as it can verify that the primary is up and responding, and it will not fence it and fail over to itself. You'd end up with a working cluster but unavailable service. IIRC there is a monitor_link option in the resource spec for this kind of thing. Gordan From gordan at bobich.net Thu Apr 17 17:09:10 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 17 Apr 2008 18:09:10 +0100 (BST) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <1127.165.106.200.207.1208451306.squirrel@webmail> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> <1122.165.106.200.207.1208450648.squirrel@webmail> <1127.165.106.200.207.1208451306.squirrel@webmail> Message-ID: On Thu, 17 Apr 2008, Andrew Lacey wrote: >> There's an argument that if your switch is down for 30 minutes, you >> have bigger problems. If you have a 30 minute switch outage, the chances >> are that you can live with the node power-up time on top of that. > > Point taken, but the problem is that if there is a switch outage and the > nodes kill each other, then somebody has to come in, power the nodes back > on and make sure everything comes up OK. It would be much easier if the > nodes would just detect that the switch is down and wait patiently without > doing anything (since there is really nothing wrong with the nodes at all, > and if they just wait for the switch to come back, everything will be > fine.) How do you propose to differentiate between a network outage that should instigate fencing and one that shouldn't? > We do have a history of flaky network here because we're a college...we > have a lot of machines on campus that we don't control (student-owned) and > we get weird traffic, rogue machines, etc. more frequently than a > locked-down corporate environment. I want to make sure that one of those > network events doesn't needlessly bring down our mail service, which is > what will be running on this cluster. The cross-over cluster interface without a switch would probably be the best solution. That coupled with a varying fencing timeout should do most of what you seem to want to achieve. Gordan From gordan at bobich.net Thu Apr 17 17:10:51 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Thu, 17 Apr 2008 18:10:51 +0100 (BST) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <36F4E74FA8263744A6B016E6A461EFF603317E20@dino.eu.tieto.com> References: <1032.165.106.200.207.1208447225.squirrel@webmail><36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com><1122.165.106.200.207.1208450648.squirrel@webmail> <1127.165.106.200.207.1208451306.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E20@dino.eu.tieto.com> Message-ID: On Thu, 17 Apr 2008, Harri.Paivaniemi at tietoenator.com wrote: > If you just want to have a cluster where client network can be down > infinitely without cluster to take actions, > you have to run cluster heartbeat via cross-cable and deny cluster's > link monitoring in client interface. > > Or then start using qdisk and build heuristics. At that rate you might as well just not bother specifying a fencing device - the whole cluster will just lock up until the network comes back and it can re-connect and re-establish quorum. > Note, that in RHCS 5 deadnode_timeout doesn't exist anymore in /proc. > It's totem token there, but havn't checked where it lives in /proc or > maby it's in /sys nowadays. Thanks for that. :-) Gordan From Bennie_R_Thomas at raytheon.com Thu Apr 17 17:10:29 2008 From: Bennie_R_Thomas at raytheon.com (Bennie Thomas) Date: Thu, 17 Apr 2008 12:10:29 -0500 Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <1123.165.106.200.207.1208450968.squirrel@webmail> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <1123.165.106.200.207.1208450968.squirrel@webmail> Message-ID: <48078485.40503@raytheon.com> Let's say you have 2-nodes. "nodeA" and "nodeB" (this is how the public sees them). create "private" nodenames like "nodeAe" and "nodeBe". Add the nodenames to both hosts files, make sure the private interfaces are set up with private addresses, reconfigure your cluster to use the "private" nodename. Then if you want a Cluster Alias IP address that is known to the public you assign another public address as a resource then add it to a service. Andrew Lacey wrote: >> If you have a spare NIC, and the nodes are next to each other, you could >> make them use a cross-over cable for their cluster communication, so they >> would notice that they are both still up even when the switch dies. That's >> what I do. >> > > I had considered this option but I haven't tried it. One thing I was > wondering is how the cluster knows which network interface should get the > cluster service IP address in that situation. Right now, I don't have > anything in my cluster.conf that specifies this, but it just seems to > work. I figured that if I tried to use a crossover cable, what I would > need to do is use /etc/hosts to create hostnames on this little private > network (consisting of just the 2 nodes connected by a cable) and use > those hostnames as the node hostnames in cluster.conf. If I did that, > would the cluster services try to assign the cluster service IP to the > interface with the crossover cable (when obviously what I want is to > assign it to the outward-facing interface)? > > -Andrew L > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From alacey at brynmawr.edu Thu Apr 17 17:18:55 2008 From: alacey at brynmawr.edu (Andrew Lacey) Date: Thu, 17 Apr 2008 13:18:55 -0400 (EDT) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: References: <1032.165.106.200.207.1208447225.squirrel@webmail> <1123.165.106.200.207.1208450968.squirrel@webmail> Message-ID: <1470.165.106.200.207.1208452735.squirrel@webmail> > It will assign the IPs to whatever interface already has an IP on that > subnet. i.e. if your private cluster interface (crossover one) is > 192.168.0.0/16 and your public interface is 10.0.0.0/8, you will have a > resource group with IPs on the 10.0.0.0/8 subnet, not on the > 192.168.0.0/16 subnet. > You will probably want to add additional monitoring against switch port > failures here, as otherwise if the switch port of the master node fails > (it does happen, I've seen many a switch with just 1-2 dead ports), > the backup will not notice as it can verify that the primary is up and > responding, and it will not fence it and fail over to itself. You'd end up > with a working cluster but unavailable service. IIRC there is a > monitor_link option in the resource spec for this kind of thing. Very informative post...thanks! The scenario you mentioned with a dead switch port (or a single unplugged network cable, or whatever) is something I had thought about, and I considered it to be a strike against using a crossover cable. But, this "monitor_link" sounds like it might be exactly what I've been looking for. I'll research that and see what I can find. You asked in your other post how I can tell the difference between a network outage that should cause a fence and one that shouldn't. What I wanted to do was set it up so that a node that can't reach the switch will never try to fence the other node. That way, if the switch is down and nobody can reach it, then nobody will fence. If there is a single port failure and one node can still reach the switch, then it will fence the other node and take over the services. Thanks, -Andrew L From gordan at bobich.net Thu Apr 17 18:25:26 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 17 Apr 2008 19:25:26 +0100 Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <1470.165.106.200.207.1208452735.squirrel@webmail> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <1123.165.106.200.207.1208450968.squirrel@webmail> <1470.165.106.200.207.1208452735.squirrel@webmail> Message-ID: <48079616.3050108@bobich.net> Andrew Lacey wrote: > Very informative post...thanks! The scenario you mentioned with a dead > switch port (or a single unplugged network cable, or whatever) is > something I had thought about, and I considered it to be a strike against > using a crossover cable. How does that follow? With a switch in the middle your points of failure are: cable, switch, cable With just a crossover cable (actually, it doesn't have to be crossover - 99% of NICs made in the past few years auto-detect and auto-negotiate whether they need to cross-over or not, so you can just use a straight-through cable - but that's getting off topic), you only have a single cable as a point of failure. That is certainly better than the alternative. > But, this "monitor_link" sounds like it might be > exactly what I've been looking for. I'll research that and see what I can > find. You don't need that on your cluster interface though. If the NIC or cable die, cluster will lose the connection to the other node and fence it. If you have something like iLO on multiple interfaces, you can specify multiple fencing devices, to ensure that you manage to fence the other node, regardless of which interface fails. But the crossover interface connecting the nodes is arguably the most reliable part of your 2-node cluster because it has the fewest components. > You asked in your other post how I can tell the difference between a > network outage that should cause a fence and one that shouldn't. What I > wanted to do was set it up so that a node that can't reach the switch will > never try to fence the other node. That way, if the switch is down and > nobody can reach it, then nobody will fence. If there is a single port > failure and one node can still reach the switch, then it will fence the > other node and take over the services. Is your switch managed? If so, you can use this as a fencing device simply have a node disable the other node's port. That way any subsequent attempts by the other node, to fence or do anything else, will not get anywhere. You may need to write your own fencing agent for that, though. I asked for fencing agent API in a post earlier, and there appears to be no conclusive documentation for this. I've been meaning to implement a fencing agent for exactly this sort of thing (fencing by disabling the switch port) on a 3Com switch. Gordan From lhh at redhat.com Thu Apr 17 18:42:33 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 17 Apr 2008 14:42:33 -0400 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <4807079A.5060909@bobich.net> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> Message-ID: <1208457753.6053.138.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-04-17 at 09:17 +0100, Gordan Bobic wrote: > Harri.Paivaniemi at tietoenator.com wrote: > > 1. 2-node cluster. Can't start only one node to get cluster services up - it hangs in fencing and waits until I start te second node and immediately after that, when both nodes are starting cman, the cluster comes up. So if I have lost one node, I can't get the cluster up, if I have to restart for seome reason the working node. It should work like before (both nodes are down, I start one, it fences another and comes up). Now it just waits... log says: > > > > ccsd[25272]: Error while processing connect: Connection refused > > > > This is so common error message, that it just tell's nothing to me.... > > I have seen similar error messages before, and it has usually been > caused by either the node names/interfaces/IPs not being listed > correctly in /etc/hosts file, or iptables firewalling rules blocking > communication between the nodes. It's probably also partly the cluster not being quorate. ccsd is very verbose, and it logs errors perhaps when it shouldn't... > > 2. qdisk doesn't work. 2- node cluster. Start it (both nodes at the same time) to get it up. Works ok, qdisk works, heuristic works. Everything works. If I stop cluster daemons on one node, that node can't join to cluster anymore without a complete reboot. It joins, another node says ok, the node itself says ok, quorum is registred and heuristic is up, but the node's quorum-disk stays offline and another node says this node is offline. If I reboot this machine, it joins to cluster ok. > I believe it's supposed to work that way. When a node fails it needs to > be fully restarted before it is allowed back into the cluster. I'm sure > this has been mentioned on the list recently. If you cleanly stop the cluster daemons, fencing shouldn't be needed here. If the node's not getting allowed in to the cluster, there's some reason for it. A way to tell if a node's being rejected is: cman_tool status If you see 'DisallowedNodes' (I think?), the current "quorate" partition thinks that the other node needs to be fenced. I don't remember the cases that lead to this situation, though. Anyway, clean stop of the cluster should never require fencing. > > 3. Funny thing: heuristic ping didn't work at all in the beginning and support gave me a "ping-script" which make it to work... so this describes quite well how experimental this cluster is nowadays... > > > > I have to tell you it is a FACT that basics are ok: fencing works ok in a normal situation, I don't have typos, configs are in sync, everything is ok, but these problems still exists. > > I've been in similar situations before, but in the end it always turned > out to be me doing something silly (see above re: host files and > iptables as examples). Need for the ping-script is definitely a bug. It's because ping uses signals to wake itself up, and qdiskd blocked those signals before fork (and of course, ping doesn't unblock signals itself). It's fixed in current 4.6.z/5.1.z errata (IIRC) and definitely in 5.2 beta. > > I have 2 times sent sosreports etc. so RH support. They hava spent 3 weeks and still can't say whats wrong... > Sadly, that seems to be the quality of commercial support from any > vendor. Support nowdays seems to have only one purpose - managerial > back-covering exercise so they can pass the buck. It's unfortunate that this is the conception. -- Lon From alacey at brynmawr.edu Thu Apr 17 18:51:55 2008 From: alacey at brynmawr.edu (Andrew Lacey) Date: Thu, 17 Apr 2008 14:51:55 -0400 (EDT) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <48079616.3050108@bobich.net> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <1123.165.106.200.207.1208450968.squirrel@webmail> <1470.165.106.200.207.1208452735.squirrel@webmail> <48079616.3050108@bobich.net> Message-ID: <1752.165.106.200.207.1208458315.squirrel@webmail> >> Very informative post...thanks! The scenario you mentioned with a dead >> switch port (or a single unplugged network cable, or whatever) is >> something I had thought about, and I considered it to be a strike >> against >> using a crossover cable. > > How does that follow? With a switch in the middle your points of failure > are: > cable, switch, cable The potential problem with the crossover cable design is: Although the cluster communication goes over the crossover cable, the path to the switch is used for user connections to the cluster service. Suppose node 1 is active and node 2 is standby. Node 1 loses its connection to the switch for whatever reason, but node 2 doesn't. Since the heartbeat goes across the crossover cable, the nodes think nothing is wrong, so no failover occurs and the service is not reachable to users. If the service had failed over to node 2 (which can still talk to the switch), it would be reachable to users. Eliminating the crossover cable and sending the cluster traffic through the switch eliminates this problem nicely -- both nodes try to fence, but node 1 can't reach anything, so node 2 kills node 1 and the service is up on node 2. But then, of course, you have the pathological case when neither node can talk to the switch until the downed switch comes back up, and boom, they both fence each other. Maybe the monitor_link option in conjunction with the crossover-cable heartbeat will fix this. I'm in the process of setting that up right now, so I'll post back when I have a result. -Andrew L From lhh at redhat.com Thu Apr 17 18:54:56 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 17 Apr 2008 14:54:56 -0400 Subject: [Linux-cluster] Meaning of Cluster Cycle and timeout problems In-Reply-To: References: Message-ID: <1208458496.6053.150.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-04-17 at 09:08 +0200, Peter wrote: > Hi! > > In our Cluster we have the following entry in the "messages" logfile: > > "qdiskd[4314]: qdisk cycle took more than 3 seconds to > complete (3.890000)" It means it took more than 3 seconds for one qdiskd cycle to complete. This is a whole lot: 8192 bytes in 16 block reads some internal calculations 512 bytes in 1 block write (that's it...) > Theese messages are very frequent. I can not find anything except the > source code via google and i am sorry to say that i am not so familar > with c to get the point. > > > We also have sometimes a quorum timeout: > > "kernel: CMAN: Quorum device /dev/sdh timed out" > > > Are theese two messages independent and what is the meaning of the > first message? No, they're 100% related. It sounds like qdiskd is getting starved for I/O to /dev/sdh, or possibly it's getting CPU-starved for some reason. Being that it's more or less a real-time program which helps keep the cluster running, that's bad! In your case, it's getting hung up for longer than the cluster failover time, so CMAN thinks qdiskd has died. Not good. (1) Turn *off* status_file if you have it enabled! It's for debugging, and under certain load patterns, it can really slow down qdiskd. (2) If you think it's I/O, what you should try is (assuming you're using cluster2/rhel5/centos5/etc. here): echo deadline > /sys/block/sdh/queue If you had a default of 10 seconds (1 interval 10 tko), you should also do: echo 2500 > /sys/block/sdh/queue/iosched/write_expire ... you've got at least 3 for interval, so I'm not sure this would apply to you. [NOTE: On rhel4/centos4/stable, I think you have to set the I/O scheduler globally in the kernel command line at system boot.] (3) If you think qdiskd is getting CPU starved, you can adjust the 'scheduler' and 'priority' values in cluster.conf to something different. I think the man page might be wrong; I think the highest 'priority' value for the 'rr' scheduler is 99, not 100. See the qdisk(5) man page for more information on those. -- Lon From lhh at redhat.com Thu Apr 17 18:56:29 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 17 Apr 2008 14:56:29 -0400 Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> <1122.165.106.200.207.1208450648.squirrel@webmail> Message-ID: <1208458589.6053.152.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-04-17 at 17:46 +0100, gordan at bobich.net wrote: > On Thu, 17 Apr 2008, Andrew Lacey wrote: > > >> but you could also just tune deadnode_timeout to be different on both > >> nodes: this results the behaviour Gordan told - the node that has smaller > >> deadnode_timeout would fence first. > > > > Would this work in a situation where the switch was down for a few > > minutes? Suppose the deadnode_timeout is 30 seconds on one node and 60 > > seconds on the other. So, after 60 seconds of switch downtime, both nodes > > would be trying to fence. If the switch comes up after being down for 5 > > minutes, they would still immediately fence each other. Or am I not > > thinking about this correctly? > > There's an argument that if your switch is down for 30 minutes, you > have bigger problems. If you have a 30 minute switch outage, the chances > are that you can live with the node power-up time on top of that. ... or an argument that maybe the 'sleep' delay in a fencing agent on a given node isn't necessarily a bad thing after all :D -- Lon From lhh at redhat.com Thu Apr 17 19:03:08 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 17 Apr 2008 15:03:08 -0400 Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> <1122.165.106.200.207.1208450648.squirrel@webmail> <1127.165.106.200.207.1208451306.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E20@dino.eu.tieto.com> Message-ID: <1208458988.6053.157.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-04-17 at 18:10 +0100, gordan at bobich.net wrote: > On Thu, 17 Apr 2008, Harri.Paivaniemi at tietoenator.com wrote: > > > If you just want to have a cluster where client network can be down > > infinitely without cluster to take actions, > > you have to run cluster heartbeat via cross-cable and deny cluster's > > link monitoring in client interface. > > > > Or then start using qdisk and build heuristics. > > At that rate you might as well just not bother specifying a fencing device > - the whole cluster will just lock up until the network comes back and it > can re-connect and re-establish quorum. > > > Note, that in RHCS 5 deadnode_timeout doesn't exist anymore in /proc. > > It's totem token there, but havn't checked where it lives in /proc or > > maby it's in /sys nowadays. > > Thanks for that. :-) It's just cluster.conf at this point. It's not possible to specify different timeouts on different nodes as you can with deadnode_timeout. Of course, I think doing different deadnode_timeouts is kind of nuts :D -- Lon From gordan at bobich.net Thu Apr 17 21:28:54 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 17 Apr 2008 22:28:54 +0100 Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <1208458988.6053.157.camel@ayanami.boston.devel.redhat.com> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E1F@dino.eu.tieto.com> <1122.165.106.200.207.1208450648.squirrel@webmail> <1127.165.106.200.207.1208451306.squirrel@webmail> <36F4E74FA8263744A6B016E6A461EFF603317E20@dino.eu.tieto.com> <1208458988.6053.157.camel@ayanami.boston.devel.redhat.com> Message-ID: <4807C116.1000609@bobich.net> Lon Hohberger wrote: >>> Note, that in RHCS 5 deadnode_timeout doesn't exist anymore in /proc. >>> It's totem token there, but havn't checked where it lives in /proc or >>> maby it's in /sys nowadays. >> Thanks for that. :-) > > It's just cluster.conf at this point. > > It's not possible to specify different timeouts on different nodes as > you can with deadnode_timeout. Of course, I think doing different > deadnode_timeouts is kind of nuts :D Not at all. It is _vital_ when you only have 2 nodes. Gordan From barbos at gmail.com Fri Apr 18 01:10:47 2008 From: barbos at gmail.com (Alex Kompel) Date: Thu, 17 Apr 2008 18:10:47 -0700 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 In-Reply-To: <1208428065.21043.60.camel@hjpsuse.tebit> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> <1208428065.21043.60.camel@hjpsuse.tebit> Message-ID: <3ae027040804171810p6dffb05dp13f1a7fb9a0ddbae@mail.gmail.com> 2008/4/17 Harri P?iv?niemi : > > The 2nd problem that still exists is: > > When node a and b are running and everything is ok. I stop node b's > cluster daemons. when I start node b again, this situation stays > forever: > > ---------------- > node a - clustat > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, Local, rgmanager > areenasql2 2 Offline > /dev/sda 0 Online, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > ------------------- > > node b - clustat > > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, rgmanager > areenasql2 2 Online, Local, rgmanager > /dev/sda 0 Offline, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > > So node b's quorum disk is offline, log says it's registred ok and > heuristic is UP... node a sees node b as offline. If I reboot node b, it > works ok and joins ok... Now that you have mentioned it - I remember stumbling upon the similar problem. It happens if you restart the cluster services before the cluster realizes the node is dead. I guess it is a bug since the node is in some sort of limbo state at that moment reporting itsefl being part of the cluster while the cluster does not recognize it as a member. If you wait 70 seconds ( cluster.conf: ) before starting the cluster services then it will come up fine. The reboot works for you because it take longer than 70 sec (correct me if I am wrong). So try stopping node b cluster services, wait 70 secs and then start them back up. -Alex From Harri.Paivaniemi at tietoenator.com Fri Apr 18 04:23:29 2008 From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com) Date: Fri, 18 Apr 2008 07:23:29 +0300 Subject: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> <1208428065.21043.60.camel@hjpsuse.tebit> <3ae027040804171810p6dffb05dp13f1a7fb9a0ddbae@mail.gmail.com> Message-ID: <36F4E74FA8263744A6B016E6A461EFF603317E21@dino.eu.tieto.com> Oh my dear Alex, It really goes that way! - I just can't believe - you are one hell of a genious. I havn't had a clue about it could be something this simple. It really works. I feel stupid. So, I was really driving grazy with this cluster ver 5 yesterday, but now it seems that both of my problems are solved: 1. unable to bring just one node up in 2-node cluster - hanging in fencing / fence failed Reason: cman was told (by RH) to be started before qdisk and this is wrong way. Qdisk have to be started first in this situation, so fence_tool is not wondering why cluster is not quorate ;) 2. restart of cluster daemons not succesfull Reason: You have to wait "token timeout" before starting again ;) Great. Thanks for all you. RH support has been thinking these problems 3 weeks now without success. -hjp -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Alex Kompel Sent: Fri 4/18/2008 4:10 To: linux clustering Subject: Re: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1 2008/4/17 Harri P?iv?niemi : > > The 2nd problem that still exists is: > > When node a and b are running and everything is ok. I stop node b's > cluster daemons. when I start node b again, this situation stays > forever: > > ---------------- > node a - clustat > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, Local, rgmanager > areenasql2 2 Offline > /dev/sda 0 Online, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > ------------------- > > node b - clustat > > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > areenasql1 1 Online, rgmanager > areenasql2 2 Online, Local, rgmanager > /dev/sda 0 Offline, Quorum Disk > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > service:areena areenasql1 started > > > So node b's quorum disk is offline, log says it's registred ok and > heuristic is UP... node a sees node b as offline. If I reboot node b, it > works ok and joins ok... Now that you have mentioned it - I remember stumbling upon the similar problem. It happens if you restart the cluster services before the cluster realizes the node is dead. I guess it is a bug since the node is in some sort of limbo state at that moment reporting itsefl being part of the cluster while the cluster does not recognize it as a member. If you wait 70 seconds ( cluster.conf: ) before starting the cluster services then it will come up fine. The reboot works for you because it take longer than 70 sec (correct me if I am wrong). So try stopping node b cluster services, wait 70 secs and then start them back up. -Alex -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 4652 bytes Desc: not available URL: From harri.paivaniemi at tietoenator.com Fri Apr 18 06:26:34 2008 From: harri.paivaniemi at tietoenator.com (Harri =?ISO-8859-1?Q?P=E4iv=E4niemi?=) Date: Fri, 18 Apr 2008 09:26:34 +0300 Subject: [Linux-cluster] Is heuristic really required? In-Reply-To: References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> <1208428065.21043.60.camel@hjpsuse.tebit> Message-ID: <1208499994.21043.81.camel@hjpsuse.tebit> Hi, Qdisk man says at least 1 heuristic is reguired. Is it? I have (accidentally) tested and to my mind it worked fine without heuristics. I gave 1 vote to quorumd and no - tag at all and it seemed to work normally like 3-vote cluster should... -hjp From harri.paivaniemi at tietoenator.com Fri Apr 18 07:48:55 2008 From: harri.paivaniemi at tietoenator.com (Harri =?ISO-8859-1?Q?P=E4iv=E4niemi?=) Date: Fri, 18 Apr 2008 10:48:55 +0300 Subject: [Linux-cluster] Is heuristic really required? In-Reply-To: <1208499994.21043.81.camel@hjpsuse.tebit> References: <36F4E74FA8263744A6B016E6A461EFF603317E1B@dino.eu.tieto.com> <4807079A.5060909@bobich.net> <1208423931.27774.2.camel@admc.win-rar.local> <1208424527.21043.33.camel@hjpsuse.tebit> <1208424763.27774.6.camel@admc.win-rar.local> <1208426290.21043.48.camel@hjpsuse.tebit> <1208427177.27774.8.camel@admc.win-rar.local> <1208428065.21043.60.camel@hjpsuse.tebit> <1208499994.21043.81.camel@hjpsuse.tebit> Message-ID: <1208504935.21043.83.camel@hjpsuse.tebit> I'll answer myself, Of course to be sure I can put "exit 0" to heuristic so it's officially disabled... -hjp On Fri, 2008-04-18 at 09:26 +0300, Harri P?iv?niemi wrote: > Hi, > > Qdisk man says at least 1 heuristic is reguired. > > Is it? > > I have (accidentally) tested and to my mind it worked fine without > heuristics. I gave 1 vote to quorumd and no - tag at all and > it seemed to work normally like 3-vote cluster should... > > > > -hjp > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From admin.cluster at gmail.com Fri Apr 18 13:21:28 2008 From: admin.cluster at gmail.com (Anthony) Date: Fri, 18 Apr 2008 15:21:28 +0200 Subject: [Linux-cluster] Replacing a RAID 5 disk, then... Message-ID: <4808A058.50704@gmail.com> Hello, i had a problem with one of my 5 disks of my sun v40z server, one of the disks had gone out of service, and i had that Beep sound , notifying me of a raid 5 problem, so i ordered a new one, and replaced it , the beep sound is still on, i think that the raid 5 is beeing re-constructed!?!?! now i want to know, what are the RedHat commands to issue to see what is happennig, and what is the state of my raid-5. i am under RedHat Enterprise Linux AS 4.2. Regards, Anthony. From Derek.Anderson at compellent.com Fri Apr 18 13:52:14 2008 From: Derek.Anderson at compellent.com (Derek Anderson) Date: Fri, 18 Apr 2008 08:52:14 -0500 Subject: [Linux-cluster] Replacing a RAID 5 disk, then... In-Reply-To: <4808A058.50704@gmail.com> References: <4808A058.50704@gmail.com> Message-ID: <99E0F1976E2DA2499F3E6EB18B25F036042E7301@honeywheat.Beer.Town> Anthony, When the failed disk was replaced a RAID rebuild should have been initiated; that is a good thing. The RAID device will continue to operate in degraded mode until the rebuild has completed. I assume that's why it is still beeping at you. And the fact that something is beeping at you makes it sound like your RAID is managed in hardware, in which case operating system commands aren't going to give you an indication of what is happening. Is this true? Are your disks contained in an external disk enclosure? If so, what kind. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Anthony Sent: Friday, April 18, 2008 8:21 AM To: linux clustering Subject: [Linux-cluster] Replacing a RAID 5 disk, then... Hello, i had a problem with one of my 5 disks of my sun v40z server, one of the disks had gone out of service, and i had that Beep sound , notifying me of a raid 5 problem, so i ordered a new one, and replaced it , the beep sound is still on, i think that the raid 5 is beeing re-constructed!?!?! now i want to know, what are the RedHat commands to issue to see what is happennig, and what is the state of my raid-5. i am under RedHat Enterprise Linux AS 4.2. Regards, Anthony. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From matthias.schlarb at sap.com Fri Apr 18 13:55:44 2008 From: matthias.schlarb at sap.com (Matthias Schlarb) Date: Fri, 18 Apr 2008 15:55:44 +0200 Subject: [Linux-cluster] operation configuration: start timeout Message-ID: <4808A860.4070300@sap.com> Hello, how can I add a start timeout for a cluster resource in cluster.conf? And what is the default there respectively how much time does a resource agent have to start his resource before the cluster assumes that the start has failed? The PDF and the system-config-cluster doesn't provide anything about that and I don't have a reference for the config xml. Many thanks in advance and best regards, -- Matthias Schlarb From admin.cluster at gmail.com Fri Apr 18 14:05:48 2008 From: admin.cluster at gmail.com (Anthony) Date: Fri, 18 Apr 2008 16:05:48 +0200 Subject: [Linux-cluster] Replacing a RAID 5 disk, then... In-Reply-To: <99E0F1976E2DA2499F3E6EB18B25F036042E7301@honeywheat.Beer.Town> References: <4808A058.50704@gmail.com> <99E0F1976E2DA2499F3E6EB18B25F036042E7301@honeywheat.Beer.Town> Message-ID: <4808AABC.5030201@gmail.com> Dear Derek, Thanks a lot for your answer, indeed, i have a Hardware Raid 5, it is an internal MegaRaid product. i hope that the reconstruction will be ok. Any ideas how much time the reconstruction will take (it is 5*146GB SCSI HD)? Regards, Anthony. Derek Anderson wrote: > Anthony, > > When the failed disk was replaced a RAID rebuild should have been > initiated; that is a good thing. The RAID device will continue to > operate in degraded mode until the rebuild has completed. I assume > that's why it is still beeping at you. > > And the fact that something is beeping at you makes it sound like your > RAID is managed in hardware, in which case operating system commands > aren't going to give you an indication of what is happening. Is this > true? Are your disks contained in an external disk enclosure? If so, > what kind. > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Anthony > Sent: Friday, April 18, 2008 8:21 AM > To: linux clustering > Subject: [Linux-cluster] Replacing a RAID 5 disk, then... > > Hello, > > i had a problem with one of my 5 disks of my sun v40z server, > one of the disks had gone out of service, and i had that Beep sound , > notifying me of a raid 5 problem, > so i ordered a new one, and replaced it , the beep sound is still on, i > think that the raid 5 is beeing re-constructed!?!?! > now i want to know, what are the RedHat commands to issue to see what is > > happennig, and what is the state of my raid-5. > > i am under RedHat Enterprise Linux AS 4.2. > > Regards, > Anthony. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > From isplist at logicore.net Fri Apr 18 16:35:38 2008 From: isplist at logicore.net (isplist at logicore.net) Date: Fri, 18 Apr 2008 11:35:38 -0500 Subject: [Linux-cluster] XEN and IBM x440 Message-ID: <2008418113538.293228@leena> Anyone using the IBM x440 and has installed rhel50 xen on it? After install, it can't seem to find the root and other files so just keeps rebooting. Mike From alacey at brynmawr.edu Fri Apr 18 16:55:22 2008 From: alacey at brynmawr.edu (Andrew Lacey) Date: Fri, 18 Apr 2008 12:55:22 -0400 (EDT) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <48079616.3050108@bobich.net> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <1123.165.106.200.207.1208450968.squirrel@webmail> <1470.165.106.200.207.1208452735.squirrel@webmail> <48079616.3050108@bobich.net> Message-ID: <4947.165.106.200.207.1208537722.squirrel@webmail> Just an update...I set up the cluster to communicate over a crossover cable, and to monitor_link for the public IP address. This works great for the scenario where one node loses its public network link (services fail over to the other node, without fencing), and reasonably well for the scenario where both lose their public links (cluster services stop cleanly after both nodes realize they lost their links, but nothing is fenced). The only remaining thing I want to do is get the deadnode_timeout set up so that if the crossover cable link is lost, one node will always fence the other first (rather than both at the same time). I tried just changing the value stored in /proc/cluster/config/cman/deadnode_timeout, but this does not hold after a reboot (it changes back to default 21). Does anyone know the right way to change this value on one node only? -Andrew L From gordan at bobich.net Fri Apr 18 17:10:10 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Fri, 18 Apr 2008 18:10:10 +0100 (BST) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: <4947.165.106.200.207.1208537722.squirrel@webmail> References: <1032.165.106.200.207.1208447225.squirrel@webmail> <1123.165.106.200.207.1208450968.squirrel@webmail> <1470.165.106.200.207.1208452735.squirrel@webmail> <48079616.3050108@bobich.net> <4947.165.106.200.207.1208537722.squirrel@webmail> Message-ID: On Fri, 18 Apr 2008, Andrew Lacey wrote: > The only remaining thing I want to do is get the deadnode_timeout set up > so that if the crossover cable link is lost, one node will always fence > the other first (rather than both at the same time). I tried just changing > the value stored in /proc/cluster/config/cman/deadnode_timeout, but this > does not hold after a reboot (it changes back to default 21). Does anyone > know the right way to change this value on one node only? echo "100" > /proc/cluster/config/cman/deadnode_timeout in /etc/rc.local? Gordan From jerlyon at gmail.com Fri Apr 18 19:49:50 2008 From: jerlyon at gmail.com (Jeremy Lyon) Date: Fri, 18 Apr 2008 13:49:50 -0600 Subject: [Linux-cluster] script resource start, stop and status timeouts Message-ID: <779919740804181249k624a6a95le2ffe7c81319bf4@mail.gmail.com> Hi, I'm currently using cluster on RHEL 4.6 and will be soon using moving to cluster on RHEL 5.1. We are using some script resources and I'm trying to find if there are timeouts on the start, stop and status functions. If so, what are the defaults and can they be tuned? TIA Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: From alacey at brynmawr.edu Fri Apr 18 20:00:54 2008 From: alacey at brynmawr.edu (Andrew Lacey) Date: Fri, 18 Apr 2008 16:00:54 -0400 (EDT) Subject: [Linux-cluster] IP-based tie-breaker on a 2-node cluster? In-Reply-To: References: <1032.165.106.200.207.1208447225.squirrel@webmail> <1123.165.106.200.207.1208450968.squirrel@webmail> <1470.165.106.200.207.1208452735.squirrel@webmail> <48079616.3050108@bobich.net> <4947.165.106.200.207.1208537722.squirrel@webmail> Message-ID: <2519.165.106.200.207.1208548854.squirrel@webmail> > echo "100" > /proc/cluster/config/cman/deadnode_timeout > > in /etc/rc.local? > > Gordan That did it. Thanks! -Andrew L From Harri.Paivaniemi at tietoenator.com Sat Apr 19 03:39:19 2008 From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com) Date: Sat, 19 Apr 2008 06:39:19 +0300 Subject: [Linux-cluster] script resource start, stop and status timeouts References: <779919740804181249k624a6a95le2ffe7c81319bf4@mail.gmail.com> Message-ID: <36F4E74FA8263744A6B016E6A461EFF603317E23@dino.eu.tieto.com> Those can be tuned, /usr/share/cluster/script.sh ... is for normal init- scripts. There is also plenty of other scripts there for different purposes. In that script: .... tels it's infinite timeout by default. -hjp -----Original Message----- From: linux-cluster-bounces at redhat.com on behalf of Jeremy Lyon Sent: Fri 4/18/2008 22:49 To: linux-cluster at redhat.com Subject: [Linux-cluster] script resource start, stop and status timeouts Hi, I'm currently using cluster on RHEL 4.6 and will be soon using moving to cluster on RHEL 5.1. We are using some script resources and I'm trying to find if there are timeouts on the start, stop and status functions. If so, what are the defaults and can they be tuned? TIA Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2996 bytes Desc: not available URL: From Harri.Paivaniemi at tietoenator.com Sat Apr 19 04:44:59 2008 From: Harri.Paivaniemi at tietoenator.com (Harri.Paivaniemi at tietoenator.com) Date: Sat, 19 Apr 2008 07:44:59 +0300 Subject: [Linux-cluster] Howto multicast? References: <4808A058.50704@gmail.com><99E0F1976E2DA2499F3E6EB18B25F036042E7301@honeywheat.Beer.Town> <4808AABC.5030201@gmail.com> Message-ID: <36F4E74FA8263744A6B016E6A461EFF603317E24@dino.eu.tieto.com> Please explain me, I havn't use multicasting very much so now I have problems to understand this RHCS 5- communication. I have tought it goes this way: - cman uses either broadcast or multicast nowadays (multicast by default) - openais uses always multicast, with it's default address Is this right? What is the difference there - if we have openais communicating, why there is another thing also communicating? I have been told that openais is cman's communication channel so what the heck? So, If I have 2 clusters in the same subnet, how to tell these things to be different? D. Teigland says: "" When openais is started by cman, the openais.conf file is not used. Many of the configuration parameters listed in openais.conf can be set in cluster.conf instead. See the openais.conf man page for the specific parameters that can be set in these sections. Note that settings in the section will override any comparable settings in the openais sections above (in particular, bindnetaddr, mcastaddr, mcastport and nodeid will always be replaced by values in ). "" So, if I put in cluster.conf: ... for each node, what it configures? Cman or openais? You see, if I do that, log says still: -------- openais[15321]: [MAIN ] Using default multicast address of 239.192.204.175 ---------- And then, I could also tell in , that ... but what it configures? Still netstat -g shows only that default address there... So if somebody knows, how this goes, please tell me. Please... -hjp -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3289 bytes Desc: not available URL: From ccaulfie at redhat.com Mon Apr 21 07:17:36 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Mon, 21 Apr 2008 08:17:36 +0100 Subject: [Linux-cluster] Howto multicast? In-Reply-To: <36F4E74FA8263744A6B016E6A461EFF603317E24@dino.eu.tieto.com> References: <4808A058.50704@gmail.com><99E0F1976E2DA2499F3E6EB18B25F036042E7301@honeywheat.Beer.Town> <4808AABC.5030201@gmail.com> <36F4E74FA8263744A6B016E6A461EFF603317E24@dino.eu.tieto.com> Message-ID: <480C3F90.4080203@redhat.com> Harri.Paivaniemi at tietoenator.com wrote: > Please explain me, > > I havn't use multicasting very much so now I have problems to understand this RHCS 5- communication. > > I have tought it goes this way: > > - cman uses either broadcast or multicast nowadays (multicast by default) > - openais uses always multicast, with it's default address > > Is this right? What is the difference there - if we have openais communicating, why there is another thing also communicating? I have been told that openais is cman's communication channel so what the heck? There are several "other thing"s communicating, as well as openais. In particular tou might see ccsd or DLM traffic. Neither of which uses the openais transports. > So, If I have 2 clusters in the same subnet, how to tell these things to be different? cman separates the cluster by cluster name. In fact it hashes the name into a 16 bit cluster number and uses that to generate a multicast address. Though this can be overriden in cluster.conf. If you have two cluster son one subnet then they will probably use different multicast addresses if the hash is different. If you're unlucky enough to have a clash of hashes and they two clusters decide to use the same multicast address, then you can override either the cluster ID or the multicast address. or When openais is started by cman, the openais.conf file is not used. Many of > the configuration parameters listed in openais.conf can be set in cluster.conf > instead. > > See the openais.conf man page for the specific parameters that can be set in > these sections. Note that settings in the section will > override any comparable settings in the openais sections above (in particular, > bindnetaddr, mcastaddr, mcastport and nodeid will always be replaced by values > in ). > "" > > So, if I put in cluster.conf: > > > > > ... for each node, what it configures? Cman or openais? cman runs as a plugin to openais. so cman uses openais as its messaggn and membership system -- Chrissie From p.elmers at gmx.de Mon Apr 21 08:53:04 2008 From: p.elmers at gmx.de (Peter) Date: Mon, 21 Apr 2008 10:53:04 +0200 Subject: [Linux-cluster] Meaning of Cluster Cycle and timeout problems - GFS 100% cpu utilization In-Reply-To: <1208458496.6053.150.camel@ayanami.boston.devel.redhat.com> References: <1208458496.6053.150.camel@ayanami.boston.devel.redhat.com> Message-ID: <3BB695F1-D2A9-4FDA-8373-9229C15838C2@gmx.de> Hi, Thanks for the fast response! It looks like GFS causes 100% cpu utilization and therefore the qdiskd process has no processor time. Is this a known problem and has anyone seen such behavior before? We are using rhel 4.5 with the following packages: ccs-1.0.11-1.x86_64.rpm cman-1.0.17-0.x86_64.rpm cman-kernel-2.6.9-53.5.x86_64.rpm dlm-1.0.7-1.x86_64.rpm dlm-kernel-2.6.9-52.2.x86_64.rpm fence-1.32.50-2.x86_64.rpm GFS-6.1.15-1.x86_64.rpm GFS-kernel-2.6.9-75.9.x86_64.rpm gulm-1.0.10-0.x86_64.rpm iddev-2.0.0-4.x86_64.rpm lvm2-cluster-2.02.27-2.el4.x86_64.rpm magma-1.0.8-1.x86_64.rpm magma-plugins-1.0.12-0.x86_64.rpm perl-Net-Telnet-3.03-3.noarch.rpm rgmanager-1.9.72-1.x86_64.rpm system-config-cluster-1.0.51-2.0.noarch.rpm The Kernel is 2.6.9-55. Thanks for reading and answering, Peter Am 17.04.2008 um 20:54 schrieb Lon Hohberger: > On Thu, 2008-04-17 at 09:08 +0200, Peter wrote: >> Hi! >> >> In our Cluster we have the following entry in the "messages" logfile: >> >> "qdiskd[4314]: qdisk cycle took more than 3 seconds to >> complete (3.890000)" > > It means it took more than 3 seconds for one qdiskd cycle to complete. > This is a whole lot: > > 8192 bytes in 16 block reads > some internal calculations > 512 bytes in 1 block write > > (that's it...) > > >> Theese messages are very frequent. I can not find anything except the >> source code via google and i am sorry to say that i am not so familar >> with c to get the point. >> >> >> We also have sometimes a quorum timeout: >> >> "kernel: CMAN: Quorum device /dev/sdh timed out" >> >> >> Are theese two messages independent and what is the meaning of the >> first message? > > > No, they're 100% related. It sounds like qdiskd is getting starved > for > I/O to /dev/sdh, or possibly it's getting CPU-starved for some reason. > Being that it's more or less a real-time program which helps keep the > cluster running, that's bad! In your case, it's getting hung up for > longer than the cluster failover time, so CMAN thinks qdiskd has died. > Not good. > > > (1) Turn *off* status_file if you have it enabled! It's for > debugging, > and under certain load patterns, it can really slow down qdiskd. > > > (2) If you think it's I/O, what you should try is (assuming you're > using > cluster2/rhel5/centos5/etc. here): > > echo deadline > /sys/block/sdh/queue > > If you had a default of 10 seconds (1 interval 10 tko), you should > also > do: > > echo 2500 > /sys/block/sdh/queue/iosched/write_expire > > ... you've got at least 3 for interval, so I'm not sure this would > apply > to you. > > [NOTE: On rhel4/centos4/stable, I think you have to set the I/O > scheduler globally in the kernel command line at system boot.] > > > (3) If you think qdiskd is getting CPU starved, you can adjust the > 'scheduler' and 'priority' values in cluster.conf to something > different. I think the man page might be wrong; I think the highest > 'priority' value for the 'rr' scheduler is 99, not 100. See the > qdisk(5) man page for more information on those. > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2209 bytes Desc: not available URL: From rpeterso at redhat.com Mon Apr 21 12:08:07 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Mon, 21 Apr 2008 07:08:07 -0500 Subject: [Linux-cluster] Meaning of Cluster Cycle and timeout problems - GFS 100% cpu utilization In-Reply-To: <3BB695F1-D2A9-4FDA-8373-9229C15838C2@gmx.de> References: <1208458496.6053.150.camel@ayanami.boston.devel.redhat.com> <3BB695F1-D2A9-4FDA-8373-9229C15838C2@gmx.de> Message-ID: <1208779687.31105.55.camel@technetium.msp.redhat.com> On Mon, 2008-04-21 at 10:53 +0200, Peter wrote: > Hi, > > Thanks for the fast response! > > It looks like GFS causes 100% cpu utilization and therefore the qdiskd > process has no processor time. > > Is this a known problem and has anyone seen such behavior before? Hi Peter, I'm not aware of any problems in GFS that cause this symptom. Can you get a call trace with the magic sysrq key? (i.e. sysrq-t) Regards, Bob Peterson Red Hat Clustering & GFS From martin.fuerstenau at oce.com Mon Apr 21 13:39:42 2008 From: martin.fuerstenau at oce.com (Martin Fuerstenau) Date: Mon, 21 Apr 2008 15:39:42 +0200 Subject: [Linux-cluster] Problem lock_dlm_join gfs_controld join error -16 Message-ID: <1208785182.15835.27.camel@lx002140.ops.de> Hi there, I am running a 2 node cluster with Centos 5. After a clusterswitch last weekend I am not able to mount one of the filesystems on both nodes. Read a lot last hours in the internet but unfortunately I found neither a solution nor a hint where the error is. When I try mount -t gfs /dev/VolGroup01/ClusterVol01 /mnt/tmp I get /sbin/mount.gfs: lock_dlm_join: gfs_controld join error: -16 /sbin/mount.gfs: error mounting lockproto lock_dlm Has anybody an idea where to look for or what to do? Thanx, Martin Visit Oce at drupa! Register online now: This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law. If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message. Thank you for your co-operation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leo.pleiman at yahoo.com Mon Apr 21 13:48:53 2008 From: leo.pleiman at yahoo.com (Leo Pleiman) Date: Mon, 21 Apr 2008 06:48:53 -0700 (PDT) Subject: [Linux-cluster] Problem lock_dlm_join gfs_controld join error -16 Message-ID: <419691.64123.qm@web56910.mail.re3.yahoo.com> I believe you will find that the nodes haven't properly joined the fence domain. Try the following commands: ccs_tool lsfence ccs_tool lsnode cman_tool status cman_tool services ...also check /var/log/messages --Leo ----- Original Message ---- From: Martin Fuerstenau To: linux-cluster at redhat.com Sent: Monday, April 21, 2008 9:39:42 AM Subject: [Linux-cluster] Problem lock_dlm_join gfs_controld join error -16 Hi there, I am running a 2 node cluster with Centos 5. After a clusterswitch last weekend I am not able to mount one of the filesystems on both nodes. Read a lot last hours in the internet but unfortunately I found neither a solution nor a hint where the error is. When I try mount -t gfs /dev/VolGroup01/ClusterVol01 /mnt/tmp I get /sbin/mount.gfs: lock_dlm_join: gfs_controld join error: -16 /sbin/mount.gfs: error mounting lockproto lock_dlm Has anybody an idea where to look for or what to do? Thanx, Martin This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law. If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message. Thank you for your co-operation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at tangent.co.za Mon Apr 21 14:00:04 2008 From: lists at tangent.co.za (Chris Picton) Date: Mon, 21 Apr 2008 14:00:04 +0000 (UTC) Subject: [Linux-cluster] GNBD speed Message-ID: Hi All I am testing the following scenario: A DRBD mirror between two servers, which heartbeat failover the drbd primary, gnbd export and ip address. I am trying to find potential bottlenecks, and have done the following tests. Network speed between the DRBD servers (A and B) --------------------------------------------------------- (A) dd if=/dev/zero bs=1G count=1 | nc 10.100.1.2 5001 (B) nc k -l 5001 | dd of=/dev/null (A) reports: 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 7.6384 seconds, 141 MB/s DRBD sync speed: ---------------------------------------------------------- dd if=/dev/zero bs=1G count=1 of=/dev/drbd0 oflag=sync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 10.7832 seconds, 99.6 MB/s Network speed between GNBD export (A) and import (C) ----------------------------------------------------------- (C) dd if=/dev/zero bs=1G count=1 | nc nfs1 5001 (A) nc -k -l 5001 | dd of=/dev/null (C) reports: 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 10.4001 seconds, 103 MB/s Network speed between GNBD import (C) and export (A) ----------------------------------------------------------- (A) dd if=/dev/zero bs=1G count=1 | nc 10.200.3.10 5001 (C) nc -k -l 5001 | dd of=/dev/null (A) reports: 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 10.4001 seconds, 93 MB/s So I have established that writing to drbd directly is fast, and network speed is fast However, using gnbd as follows: on the drbd server: gnbd_serv -n /sbin/gnbd_export -c -e r0 -d /dev/drbd0 On the client: gnbd_import -i 10.200.3.3 I try the speed tests over the gnbd devices: Reading from GNBD: ------------------------------------------------------------ dd if=/dev/gnbd0 of=/dev/null bs=1G count=1 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 17.0842 seconds, 62.8 MB/s Writing to GNBD (no sync flag) ------------------------------------------------------------ dd if=/dev/zero of=/dev/gnbd0 bs=1G count=1 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 54.4142 seconds, 19.7 MB/s Writing to GNBD (sync flag) ------------------------------------------------------------ dd if=/dev/zero of=/dev/gnbd0 bs=1G count=1 oflag=sync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 53.3085 seconds, 20.1 MB/s I am almost happy with the 62 Mb/s read speed, but the 20 MB/sec write speed seems a bit low, compared to the write rate to drbd, and the network speed. Can anyone give any hints for optimising the gnbd write speed (and read speed) Chris From martin.fuerstenau at oce.com Mon Apr 21 14:14:01 2008 From: martin.fuerstenau at oce.com (Martin Fuerstenau) Date: Mon, 21 Apr 2008 16:14:01 +0200 Subject: [Linux-cluster] Problem lock_dlm_join gfs_controld join error -16 In-Reply-To: <419691.64123.qm@web56910.mail.re3.yahoo.com> References: <419691.64123.qm@web56910.mail.re3.yahoo.com> Message-ID: <1208787241.19274.6.camel@lx002140.ops.de> Hi there, there is a main difference when I use "cman_tool services". On the working node: [root at node2 ~]# cman_tool services type level name id state fence 0 default 00010001 none [1 2] dlm 1 clvmd 00020001 none [1 2] dlm 1 CfusterGFS01 00040001 none [1 2] dlm 1 ClusterGFS02 00060001 none [2] dlm 1 rgmanager 00070001 none [1 2] gfs 2 CfusterGFS01 00030001 none [1 2] gfs 2 ClusterGFS02 00050001 none [2] On the non-working node: [root at node1 ~]# cman_tool services type level name id state fence 0 default 00010001 none [1 2] dlm 1 clvmd 00020001 none [1 2] dlm 1 CfusterGFS01 00040001 none [1 2] dlm 1 rgmanager 00070001 none [1 2] gfs 2 CfusterGFS01 00030001 none [1 2] Well - but CfusterGFS01 is not mounted on node1. Seems to me like dlm (or gfs) is thinking that the filesystem is mounted. But it isn't. Has anybody an idea how to kick it out? Thx - Martin On Mon, 2008-04-21 at 06:48 -0700, Leo Pleiman wrote: > I believe you will find that the nodes haven't properly joined the > fence domain. Try the following commands: > > ccs_tool lsfence > ccs_tool lsnode > cman_tool status > cman_tool services > > ...also check /var/log/messages > > --Leo > > > ----- Original Message ---- > From: Martin Fuerstenau > To: linux-cluster at redhat.com > Sent: Monday, April 21, 2008 9:39:42 AM > Subject: [Linux-cluster] Problem lock_dlm_join gfs_controld join error > -16 > > Hi there, > > I am running a 2 node cluster with Centos 5. After a clusterswitch > last weekend I am not able to mount one of the filesystems on both > nodes. Read a lot last hours in the internet but unfortunately I found > neither a solution nor a hint where the error is. > > When I try > > mount -t gfs /dev/VolGroup01/ClusterVol01 /mnt/tmp > > > I get > > /sbin/mount.gfs: lock_dlm_join: gfs_controld join error: -16 > /sbin/mount.gfs: error mounting lockproto lock_dlm > > Has anybody an idea where to look for or what to do? > > Thanx, Martin > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Visit Oce at drupa! Register online now: This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law. If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message. Thank you for your co-operation. From connerf at ncifcrf.gov Mon Apr 21 15:03:10 2008 From: connerf at ncifcrf.gov (fred conner) Date: Mon, 21 Apr 2008 11:03:10 -0400 Subject: [Linux-cluster] Problem lock_dlm_join gfs_controld join error -16 In-Reply-To: <419691.64123.qm@web56910.mail.re3.yahoo.com> References: <419691.64123.qm@web56910.mail.re3.yahoo.com> Message-ID: <1208790190.3205.12.camel@norbert> try running the command fence_tool join on the node you are getting the error then mount the filesystem On Mon, 2008-04-21 at 06:48 -0700, Leo Pleiman wrote: > I believe you will find that the nodes haven't properly joined the > fence domain. Try the following commands: > > ccs_tool lsfence > ccs_tool lsnode > cman_tool status > cman_tool services > > ...also check /var/log/messages > > --Leo > > > ----- Original Message ---- > From: Martin Fuerstenau > To: linux-cluster at redhat.com > Sent: Monday, April 21, 2008 9:39:42 AM > Subject: [Linux-cluster] Problem lock_dlm_join gfs_controld join error > -16 > > Hi there, > > I am running a 2 node cluster with Centos 5. After a clusterswitch > last weekend I am not able to mount one of the filesystems on both > nodes. Read a lot last hours in the internet but unfortunately I found > neither a solution nor a hint where the error is. > > When I try > > mount -t gfs /dev/VolGroup01/ClusterVol01 /mnt/tmp > > > I get > > /sbin/mount.gfs: lock_dlm_join: gfs_controld join error: -16 > /sbin/mount.gfs: error mounting lockproto lock_dlm > > Has anybody an idea where to look for or what to do? > > Thanx, Martin > Visit Oce at drupa! Register online now: > > This message and attachment(s) are intended solely for use by the > addressee and may contain information that is privileged, confidential > or otherwise exempt from disclosure under applicable law. If you are > not the intended recipient or agent thereof responsible for delivering > this message to the intended recipient, you are hereby notified that > any dissemination, distribution or copying of this communication is > strictly prohibited. If you have received this communication in error, > please notify the sender immediately by telephone and with a 'reply' > message. Thank you for your co-operation. > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Fred Conner [Contractor] From lhh at redhat.com Mon Apr 21 16:58:14 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 21 Apr 2008 12:58:14 -0400 Subject: [Linux-cluster] script resource start, stop and status timeouts In-Reply-To: <779919740804181249k624a6a95le2ffe7c81319bf4@mail.gmail.com> References: <779919740804181249k624a6a95le2ffe7c81319bf4@mail.gmail.com> Message-ID: <1208797094.23820.1.camel@ayanami.boston.devel.redhat.com> On Fri, 2008-04-18 at 13:49 -0600, Jeremy Lyon wrote: > Hi, > > I'm currently using cluster on RHEL 4.6 and will be soon using moving > to cluster on RHEL 5.1. We are using some script resources and I'm > trying to find if there are timeouts on the start, stop and status > functions. If so, what are the defaults and can they be tuned? Nope, they behave the same as on rhel4.6 right now. -- Lon From lhh at redhat.com Mon Apr 21 16:58:37 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 21 Apr 2008 12:58:37 -0400 Subject: [Linux-cluster] script resource start, stop and status timeouts In-Reply-To: <36F4E74FA8263744A6B016E6A461EFF603317E23@dino.eu.tieto.com> References: <779919740804181249k624a6a95le2ffe7c81319bf4@mail.gmail.com> <36F4E74FA8263744A6B016E6A461EFF603317E23@dino.eu.tieto.com> Message-ID: <1208797117.23820.3.camel@ayanami.boston.devel.redhat.com> On Sat, 2008-04-19 at 06:39 +0300, Harri.Paivaniemi at tietoenator.com wrote: > Those can be tuned, > > /usr/share/cluster/script.sh > > ... is for normal init- scripts. There is also plenty of other scripts there for different purposes. > > In that script: > > > > > > > .... tels it's infinite timeout by default. True, but the timeouts aren't currently enforced. -- Lon From Samuel.Kielek at marriott.com Mon Apr 21 17:22:21 2008 From: Samuel.Kielek at marriott.com (Kielek, Samuel) Date: Mon, 21 Apr 2008 13:22:21 -0400 Subject: [Linux-cluster] Event in one failover domain affecting another separate failover domain Message-ID: <140D865F4BA13C4B9D3AFEFEAD1EA532057BB5EE@HDQNCEXCL1V2.mihdq.marrcorp.marriott.com> I have a 3 node RHEL 4.6 cluster with two failover domains. The idea is that two of the nodes are primary for their respective services and the remaining node is a shared failover node for both of the services. Here is an example of how the two ordered domains are configured: DOMAIN_ONE (service_one) server_a (priority=1) server_b (priority=2) DOMAIN_TWO (service_two) server_c (priority=1) server_b (priority=2) The issue I have observed is that when server_c (DOMAIN_TWO) had an issue that led to it being fenced, the service running on server_a (service_one) immediately stopped and relocated to server_b (the recovery action is set to "relocate" for both services). What I don't understand is how a failure in DOMAIN_TWO with service_two on server_c would affect service_one running on server_a in DOMAIN_ONE. The logs do not provide any obvious hints. Here is a snippet from the messages log on server_a for this time period: 11:10:56 server_a fenced[11638]: fencing node "server_c" 11:12:03 server_a fenced[11638]: fence "server_c" success 11:12:04 server_a clurgmgrd[11776]: Magma Event: Membership Change 11:12:04 server_a clurgmgrd[11776]: State change: server_c DOWN 11:12:04 server_a clurgmgrd[11776]: Stopping service service_one 11:12:04 server_a clurgmgrd: [11776]: Executing /etc/init.d/service_one stop As you can see, there is no indication as to why service_one is being stopped. The last two events in the above log should not have occurred. Has anyone else ever had this sort of issue? I'm not sure if this is a bug or a config problem. Thanks, Sam From lhh at redhat.com Mon Apr 21 17:29:50 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 21 Apr 2008 13:29:50 -0400 Subject: [Linux-cluster] operation configuration: start timeout In-Reply-To: <4808A860.4070300@sap.com> References: <4808A860.4070300@sap.com> Message-ID: <1208798990.23820.10.camel@ayanami.boston.devel.redhat.com> On Fri, 2008-04-18 at 15:55 +0200, Matthias Schlarb wrote: > Hello, > > how can I add a start timeout for a cluster resource in cluster.conf? Currently, these are present but not enforced. A simple timeout to start is not a "clean" failure case; the stop phase of the script would need to be called, and the stop phase would need to complete successfully in order to allow starting the resource on a different node in the cluster. A timeout exceeded on "stop" would make the resource mostly unrecoverable (requiring manual intervention). A timeout exceeded on "status" would probably mean a failure (right?) -- Lon From lhh at redhat.com Mon Apr 21 18:02:22 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 21 Apr 2008 14:02:22 -0400 Subject: [Linux-cluster] Event in one failover domain affecting another separate failover domain In-Reply-To: <140D865F4BA13C4B9D3AFEFEAD1EA532057BB5EE@HDQNCEXCL1V2.mihdq.marrcorp.marriott.com> References: <140D865F4BA13C4B9D3AFEFEAD1EA532057BB5EE@HDQNCEXCL1V2.mihdq.marrcorp.marriott.com> Message-ID: <1208800942.23820.30.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-04-21 at 13:22 -0400, Kielek, Samuel wrote: > The issue I have observed is that when server_c (DOMAIN_TWO) had an > issue that led to it being fenced, the service running on server_a > (service_one) immediately stopped and relocated to server_b (the > recovery action is set to "relocate" for both services). Your cluster.conf would be helpful. Also, you can increase the log level to 'debug' which would tell you more; see "Logging Configuration": http://sources.redhat.com/cluster/wiki/RGManager ...for more information. -- Lon From lhh at redhat.com Mon Apr 21 18:04:21 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 21 Apr 2008 14:04:21 -0400 Subject: [Linux-cluster] Meaning of Cluster Cycle and timeout problems - GFS 100% cpu utilization In-Reply-To: <3BB695F1-D2A9-4FDA-8373-9229C15838C2@gmx.de> References: <1208458496.6053.150.camel@ayanami.boston.devel.redhat.com> <3BB695F1-D2A9-4FDA-8373-9229C15838C2@gmx.de> Message-ID: <1208801061.23820.31.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-04-21 at 10:53 +0200, Peter wrote: > Hi, > > Thanks for the fast response! > > It looks like GFS causes 100% cpu utilization and therefore the qdiskd > process has no processor time. Ouch! That would do it. It sounds like a bug. > > (3) If you think qdiskd is getting CPU starved, you can adjust the > > 'scheduler' and 'priority' values in cluster.conf to something > > different. I think the man page might be wrong; I think the highest > > 'priority' value for the 'rr' scheduler is 99, not 100. See the > > qdisk(5) man page for more information on those. ^^ You can still set qdiskd's priority to 99 if you want ;) -- Lon From Samuel.Kielek at marriott.com Mon Apr 21 18:45:21 2008 From: Samuel.Kielek at marriott.com (Kielek, Samuel) Date: Mon, 21 Apr 2008 14:45:21 -0400 Subject: [Linux-cluster] Event in one failover domain affecting anotherseparate failover domain In-Reply-To: <1208800942.23820.30.camel@ayanami.boston.devel.redhat.com> References: <140D865F4BA13C4B9D3AFEFEAD1EA532057BB5EE@HDQNCEXCL1V2.mihdq.marrcorp.marriott.com> <1208800942.23820.30.camel@ayanami.boston.devel.redhat.com> Message-ID: <140D865F4BA13C4B9D3AFEFEAD1EA532057BB685@HDQNCEXCL1V2.mihdq.marrcorp.marriott.com> Ok, I've set the log level to debug so hopefully next time this happens I can get more info. Of course this is a production cluster so there is only so much I can do in terms of testing.. Here is the cluster.conf (sanitized but otherwise accurate):