From dde at twn.tuv.com Wed Apr 1 00:20:40 2009 From: dde at twn.tuv.com (Denis Anthony Dowling/Twn/TUV) Date: Wed, 1 Apr 2009 08:20:40 +0800 Subject: [Linux-cluster] Disable fencing in a non-shared storage cluster Message-ID: I'm trying to configure a high availability cluster for Squid. There will be no shared storage device. The problem relates to the time required for starting and stopping the fencing daemon. Is it possible just to disable this? I've tried the "clean_start=0" and "post_join_delay=-1" but without success. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gianluca.cecchi at gmail.com Wed Apr 1 15:07:06 2009 From: gianluca.cecchi at gmail.com (Gianluca Cecchi) Date: Wed, 1 Apr 2009 17:07:06 +0200 Subject: [Linux-cluster] Can same cluster name in same subnet? Message-ID: <561c252c0904010807v16e3d88av78ba899e5aa6c5d@mail.gmail.com> Conversely, how is it dangerous to have two two-node-clusters with different names sharing the intra-cluster network? In particular if one is in production and the other is for testing? And what about relative multicast-adresses for these two clusters? Can I safely use same multicast if the names are different or do I have to change? Ant rule in this case? Thanks, Gianluca From ccaulfie at redhat.com Wed Apr 1 15:13:20 2009 From: ccaulfie at redhat.com (Chrissie Caulfield) Date: Wed, 01 Apr 2009 16:13:20 +0100 Subject: [Linux-cluster] Can same cluster name in same subnet? In-Reply-To: <561c252c0904010807v16e3d88av78ba899e5aa6c5d@mail.gmail.com> References: <561c252c0904010807v16e3d88av78ba899e5aa6c5d@mail.gmail.com> Message-ID: <49D38490.6070300@redhat.com> Gianluca Cecchi wrote: > Conversely, how is it dangerous to have two two-node-clusters with > different names sharing the intra-cluster network? > In particular if one is in production and the other is for testing? > And what about relative multicast-adresses for these two clusters? Can > I safely use same multicast if the names are different or do I have to > change? Ant rule in this case? > I would strongly advise against using the same multicast address for two different cluster in the same subnet. Ideally all clusters should use different multicast addresses. -- Chrissie From sdake at redhat.com Wed Apr 1 17:29:37 2009 From: sdake at redhat.com (Steven Dake) Date: Wed, 01 Apr 2009 10:29:37 -0700 Subject: [Linux-cluster] Can same cluster name in same subnet? In-Reply-To: <561c252c0904010807v16e3d88av78ba899e5aa6c5d@mail.gmail.com> References: <561c252c0904010807v16e3d88av78ba899e5aa6c5d@mail.gmail.com> Message-ID: <1238606977.22887.15.camel@sdake-laptop> On Wed, 2009-04-01 at 17:07 +0200, Gianluca Cecchi wrote: > Conversely, how is it dangerous to have two two-node-clusters with > different names sharing the intra-cluster network? > In particular if one is in production and the other is for testing? > And what about relative multicast-adresses for these two clusters? Can > I safely use same multicast if the names are different or do I have to > change? Ant rule in this case? > > Thanks, > Gianluca > The multicast address/port always should be unique for each cluster or bad things will happen. It uniquely identifies the cluster and nothing else matters for unique identification. regards -steve > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rhurst at bidmc.harvard.edu Wed Apr 1 17:50:38 2009 From: rhurst at bidmc.harvard.edu (Robert Hurst) Date: Wed, 01 Apr 2009 13:50:38 -0400 Subject: [Linux-cluster] Disable fencing in a non-shared storage cluster In-Reply-To: References: Message-ID: <1238608238.18992.2.camel@WSBID06223.bidmc.harvard.edu> Yes, if you are not using GFS, then why start the fencing daemon at all? It's only required for GFS. ________________________________________________________________________ Robert Hurst, Sr. Cach? Administrator Beth Israel Deaconess Medical Center 1135 Tremont Street, REN-7 Boston, Massachusetts 02120-2140 617-754-8754 ? Fax: 617-754-8730 ? Cell: 401-787-3154 Any technology distinguishable from magic is insufficiently advanced. On Wed, 2009-04-01 at 08:20 +0800, Denis Anthony Dowling/Twn/TUV wrote: > > I'm trying to configure a high availability cluster for Squid. There > will be no shared storage device. The problem relates to the time > required for starting and stopping the fencing daemon. Is it possible > just to disable this? > I've tried the "clean_start=0" and "post_join_delay=-1" but without > success. > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakub.suchy at enlogit.cz Wed Apr 1 18:04:46 2009 From: jakub.suchy at enlogit.cz (Jakub Suchy) Date: Wed, 1 Apr 2009 20:04:46 +0200 Subject: [Linux-cluster] Disable fencing in a non-shared storage cluster In-Reply-To: <1238608238.18992.2.camel@WSBID06223.bidmc.harvard.edu> References: <1238608238.18992.2.camel@WSBID06223.bidmc.harvard.edu> Message-ID: <20090401180446.GC28473@galatea> Robert, even for HA cluster, you usually need Virtual IP to run your service. This IP can have the same problem as GFS - you can't know if failed node is really down or it's just a glitch -> you need fencing. => you need fencing everytime. Best, Jakub Suchy > Yes, if you are not using GFS, then why start the fencing daemon at all? > It's only required for GFS. > > > On Wed, 2009-04-01 at 08:20 +0800, Denis Anthony Dowling/Twn/TUV wrote: > > > > > I'm trying to configure a high availability cluster for Squid. There > > will be no shared storage device. The problem relates to the time > > required for starting and stopping the fencing daemon. Is it possible > > just to disable this? > > I've tried the "clean_start=0" and "post_join_delay=-1" but without > > success. > > From teigland at redhat.com Wed Apr 1 19:03:48 2009 From: teigland at redhat.com (David Teigland) Date: Wed, 1 Apr 2009 14:03:48 -0500 Subject: [Linux-cluster] Disable fencing in a non-shared storage cluster In-Reply-To: <20090401180446.GC28473@galatea> References: <1238608238.18992.2.camel@WSBID06223.bidmc.harvard.edu> <20090401180446.GC28473@galatea> Message-ID: <20090401190348.GA28414@redhat.com> On Wed, Apr 01, 2009 at 08:04:46PM +0200, Jakub Suchy wrote: > Robert, > even for HA cluster, you usually need Virtual IP to run your service. > This IP can have the same problem as GFS - you can't know if failed node > is really down or it's just a glitch -> you need fencing. > > => you need fencing everytime. You may want something to terminate the IP, but I wouldn't use the word "fencing" to describe it, just to avoid confusion. Fencing is explicitly defined as disabling access to shared storage devices. That said, you may be able to use the fencing capabilities to implement what you need. Dave From crosa at redhat.com Wed Apr 1 19:27:16 2009 From: crosa at redhat.com (Cleber Rodrigues) Date: Wed, 01 Apr 2009 16:27:16 -0300 Subject: [Linux-cluster] Disable fencing in a non-shared storage cluster In-Reply-To: <20090401190348.GA28414@redhat.com> References: <1238608238.18992.2.camel@WSBID06223.bidmc.harvard.edu> <20090401180446.GC28473@galatea> <20090401190348.GA28414@redhat.com> Message-ID: <1238614036.5711.4.camel@localhost.localdomain> On Wed, 2009-04-01 at 14:03 -0500, David Teigland wrote: > You may want something to terminate the IP, but I wouldn't use the word > "fencing" to describe it, just to avoid confusion. Fencing is explicitly > defined as disabling access to shared storage devices. > IMHO, "fencing" now means the pratical use of it. It might mean disabling access to storage, but most of the time it means STONITH. > That said, you may be able to use the fencing capabilities to implement what > you need. > And that would be...? Connecting to the (probably unreachable) machine and shutting down its network interface? Call me paranoid, but I would always go for power cycling *first*. > Dave > -- Cleber Rodrigues Solutions Architect - Red Hat, Inc. From Ed.Sanborn at genband.com Thu Apr 2 03:51:23 2009 From: Ed.Sanborn at genband.com (Ed Sanborn) Date: Wed, 1 Apr 2009 23:51:23 -0400 Subject: [Linux-cluster] Trouble after Openais upgrade to 0.80.3-22.el5 In-Reply-To: References: <20090330180739.GB6135@redhat.com> Message-ID: <593E210EDC38444DA1C17E9E9F5E264B98FC9A@GBMDMail01.genband.com> I have RHEL 5.2 I tried upgrading openais on a few of my nodes. Original version was 0.80.3-15.el5 and I upgraded to version 0.80.3-22.el5. Now the node will not connect to the cluster. I get the following error in /var/log/messages: "unable to connect to cluster infrastructure" Has anyone else run into this issue? Is there a way around this besides going back to the old version? Ed From corey.kovacs at gmail.com Thu Apr 2 05:37:05 2009 From: corey.kovacs at gmail.com (Corey Kovacs) Date: Thu, 2 Apr 2009 06:37:05 +0100 Subject: [Linux-cluster] Trouble after Openais upgrade to 0.80.3-22.el5 In-Reply-To: <593E210EDC38444DA1C17E9E9F5E264B98FC9A@GBMDMail01.genband.com> References: <20090330180739.GB6135@redhat.com> <593E210EDC38444DA1C17E9E9F5E264B98FC9A@GBMDMail01.genband.com> Message-ID: <7d6e8da40904012237r3c4fb8d0ga77e6aacc425f497@mail.gmail.com> Ed, your upgrade requires you to downgrade your 5.3 machines to use the 5.2 openais, or you upgrade all the nodes at the same time and reboot the whole thing. There is an incompatibility between the two versions of openais. Have fun,,, -Corey To the list... Is this in a FAQ somewhere? This question seems to come up quite often? On Thu, Apr 2, 2009 at 4:51 AM, Ed Sanborn wrote: > I have RHEL 5.2 > I tried upgrading openais on a few of my nodes. Original version was > 0.80.3-15.el5 ?and I upgraded to version ?0.80.3-22.el5. > Now the node will not connect to the cluster. ?I get the following error > in > /var/log/messages: > > "unable to connect to cluster infrastructure" > > Has anyone else run into this issue? ?Is there a way around this besides > going back to > the old version? > > Ed > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From kirbyzhou at sohu-rd.com Thu Apr 2 06:53:20 2009 From: kirbyzhou at sohu-rd.com (Kirby Zhou) Date: Thu, 2 Apr 2009 14:53:20 +0800 Subject: [Linux-cluster] How to resolve "Open Disconnected Pending" state? Message-ID: <07bc01c9b35f$b5a675a0$20f360e0$@com> The machine which exported gnbd is power off. The client machine fall into the state 'Open Disconnected Pending'. Any process access the dead gnbd fall into state 'D'. How can I destroy the gnbd block device on the client machine? [root at xen-727057 ~]# gnbd_import -n -l Device name : 63.131.xvdb ---------------------- Minor # : 0 sysfs name : /block/gnbd0 Server : 10.10.63.131 Port : 14567 State : Open Disconnected Pending Readonly : No Sectors : 16777216 [root at xen-727057 ~]# pvs & [1] 4561 [root at xen-727057 ~]# ps aux | fgrep pvs root 4561 0.6 0.1 79364 1544 pts/0 D 14:50 0:00 pvs root 4563 0.0 0.0 61112 608 pts/0 S+ 14:50 0:00 fgrep pvs [root at xen-727057 ~]# gnbd_import -n -R gnbd_import: ERROR cannot disconnect device #0 : Device or resource busy From kadlec at mail.kfki.hu Thu Apr 2 11:07:54 2009 From: kadlec at mail.kfki.hu (Kadlecsik Jozsef) Date: Thu, 2 Apr 2009 13:07:54 +0200 (CEST) Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> Message-ID: On Tue, 31 Mar 2009, Kadlecsik Jozsef wrote: > I'll restore the kernel on a not so critical node and will try to find out > how to trigger the bug without mailman. If that succeeds then I'll remove > the patch in question and re-run the test. It'll need a few days, surely, > but I'll report the results. I had been unsuccesful to find a reliable way to trigger the freeze without mailman. So I created a backup mailman directory by which I can test the system. The following has been verified so far: - Removed commit 17968b0fe87829edff1af7fa9ffbbc92540159fb (Remove splice_read file op for jdata files) and commit 4787e11dc7831f42228b89ba7726fd6f6901a1e3 (gfs-kmod: workaround for potential deadlock. Prefault user pages), the system freezes. - Removed commit 5e83cdb08b423478a0b6cc8f6de396ab8328d47a (gfs-kernel: Bug 466645 - reproduceable gfs (dlm) hanger with simple stresstest), the system freezes. (Please note, the volumes are mounted with noatime). If you have any idea what to do next, please write it. Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From s.wendy.cheng at gmail.com Thu Apr 2 14:37:51 2009 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 02 Apr 2009 09:37:51 -0500 Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> Message-ID: <49D4CDBF.6040203@gmail.com> Kadlecsik Jozsef wrote: > > If you have any idea what to do next, please write it. > > Do you have your kernel source somewhere (in tar ball format) so people can look into it ? -- Wendy From stevan.colaco at gmail.com Thu Apr 2 15:11:43 2009 From: stevan.colaco at gmail.com (Stevan Colaco) Date: Thu, 2 Apr 2009 18:11:43 +0300 Subject: [Linux-cluster] Unable to mount GFS File System in RHEL5.2 (32bit) Message-ID: <56bb44d0904020811u118479fdgbd6f0b004581f095@mail.gmail.com> Dear All, I have setup 2 node cluster on RHEL5.2 (32bit) + Quorum Partition + GFS Partition. i could make gfs file system but issues while trying to mount it. is it due to GFS module not loaded? unable to load GFS module. below are the details, anyone has faced this issue before, please suggest........ [root at quod-core1-uat ~]# mount -t gfs /dev/quoduat/rv /rv /sbin/mount.gfs: error mounting /dev/mapper/quoduat-rv on /rv: No such device [root at quod-core1-uat ~]# GFS rpms are installed [root at quod-core1-uat ~]# rpm -qa | grep -i gfs kmod-gfs2-1.92-1.1.el5 kmod-gfs-0.1.23-5.el5 gfs-utils-0.1.17-1.el5 gfs2-utils-0.1.44-1.el5 [root at quod-core1-uat ~]# couldn't find the gfs module loaded [root at quod-core1-uat ~]# lsmod | grep -i gfs gfs2 346344 1 lock_dlm configfs 28753 2 dlm [root at quod-core1-uat ~]# modinfo gfs throws below error:- [root at quod-core1-uat ~]# modinfo gfs modinfo: could not find module gfs [root at quod-core1-uat ~]# manually locating module, list the module [root at quod-core1-uat ~]# modinfo /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko filename: /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko license: GPL author: Red Hat, Inc. description: Global File System 0.1.23-5.el5 srcversion: F36BE93709E650F2BEC45A5 depends: gfs2 vermagic: 2.6.18-92.el5 SMP mod_unload 686 REGPARM 4KSTACKS gcc-4.1 [root at quod-core1-uat ~]# unable to install the module gfs [root at quod-core1-uat ~]# modprobe /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko FATAL: Module /lib/modules/2.6.18_92.el5/extra/gfs/gfs.ko not found. [root at quod-core1-uat ~]# [root at quod-core1-uat ~]# clustat Cluster Status for quod-clust-uat @ Thu Apr 2 18:09:45 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ quod-core2-uat.kmefic.com.kw 1 Online, rgmanager quod-core1-uat.kmefic.com.kw 2 Online, Local, rgmanager /dev/sdc1 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- service:quod-uat-ip quod-core1-uat.kmefic.com.kw started [root at quod-core1-uat ~]# Thanks in Advance, -Stevan Colaco From mrugeshkarnik at gmail.com Thu Apr 2 15:33:38 2009 From: mrugeshkarnik at gmail.com (Mrugesh Karnik) Date: Thu, 2 Apr 2009 21:03:38 +0530 Subject: [Linux-cluster] Network Interface Binding for cman Message-ID: <200904022103.38273.mrugeshkarnik@gmail.com> Hi, How do I specify which network interfaces to listen on, to cman? I specifically need it to listen on two interfaces. The system has four interfaces in total. I'm on CentOS 5.2. Thanks, Mrugesh From jeff.sturm at eprize.com Thu Apr 2 16:04:12 2009 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Thu, 2 Apr 2009 12:04:12 -0400 Subject: [Linux-cluster] Network Interface Binding for cman In-Reply-To: <200904022103.38273.mrugeshkarnik@gmail.com> References: <200904022103.38273.mrugeshkarnik@gmail.com> Message-ID: <64D0546C5EBBD147B75DE133D798665F02FDB6D8@hugo.eprize.local> It binds to a multicast address. That address is bound to one interface normally. If you need two interfaces, look into ethernet bonding. > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Mrugesh Karnik > Sent: Thursday, April 02, 2009 11:34 AM > To: linux-cluster at redhat.com > Subject: [Linux-cluster] Network Interface Binding for cman > > Hi, > > How do I specify which network interfaces to listen on, to > cman? I specifically need it to listen on two interfaces. The > system has four interfaces in total. > > I'm on CentOS 5.2. > > Thanks, > Mrugesh > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > From arwin.tugade at csun.edu Thu Apr 2 16:31:31 2009 From: arwin.tugade at csun.edu (Arwin L Tugade) Date: Thu, 2 Apr 2009 09:31:31 -0700 Subject: [Linux-cluster] rgmanager stop just hangs, clurgmgrd never terminates Message-ID: <6708F96BBF31F846BFA56EC0AE37D62281A6E3C38F@CSUN-EX-V01.csun.edu> Hey all, I ran into an issue where my cluster was quorate but none of the services were showing up via the clustat command. When I tried to do a /sbin/service rgmanager stop, it hangs indefinitely. The sigterm is sent but the clurgmgrd processes don't stop. What I ended up doing was manually kill off clurgmgrd, remove the pid file from /var/run/, restart cman and ultimately had to restart clvmd. I'm on RHEL5U3 (x86_64), 2 node with a qdisk. I'm also having this same rgmanager hang on RHEL5U2 (x86_64) 3 node. Am I doing something wrong here? Thanks, Arwin -------------- next part -------------- An HTML attachment was scrubbed... URL: From fernando at lozano.eti.br Thu Apr 2 17:38:06 2009 From: fernando at lozano.eti.br (Fernando Lozano) Date: Thu, 02 Apr 2009 14:38:06 -0300 Subject: [Linux-cluster] rgmanager stop just hangs, clurgmgrd never terminates In-Reply-To: <6708F96BBF31F846BFA56EC0AE37D62281A6E3C38F@CSUN-EX-V01.csun.edu> References: <6708F96BBF31F846BFA56EC0AE37D62281A6E3C38F@CSUN-EX-V01.csun.edu> Message-ID: <49D4F7FE.5060906@lozano.eti.br> Hi Arwin, I have the same problem on a two-node cluster (two KVM vitual machines) and on another two-node cluster with real Dell servers. If I flush iptables rules BEFORE starting cman, everything works fine. But if I start cman and rgmanager with iptables rules, I see no services and rgmanager hangs. Flusing iptables rules after starting cman changes anything. :-( I have all ports open as stated by RHCS manual, but it wasn't enough. I still cannot find why rgmanager hangs and which rules my iptables setup is missing, but I have the same behaviour on another setup with two VMware virtual machines. I don't use qdisk, clvmd nor gfs. My clustert setup has clean_start="1" on fenced. I'm on RHEL5.2, tried both 32 and 64-bits. Have you tried starting your cluster with no firewall? []s, Fernando Lozano > Hey all, > > > > I ran into an issue where my cluster was quorate but none of the > services were showing up via the clustat command. When I tried to do > a /sbin/service rgmanager stop, it hangs indefinitely. The sigterm is > sent but the clurgmgrd processes don?t stop. What I ended up doing > was manually kill off clurgmgrd, remove the pid file from /var/run/, > restart cman and ultimately had to restart clvmd. I?m on RHEL5U3 > (x86_64), 2 node with a qdisk. I?m also having this same rgmanager > hang on RHEL5U2 (x86_64) 3 node. Am I doing something wrong here? > > > > Thanks, > > Arwin > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From arwin.tugade at csun.edu Thu Apr 2 18:38:33 2009 From: arwin.tugade at csun.edu (Arwin L Tugade) Date: Thu, 2 Apr 2009 11:38:33 -0700 Subject: [Linux-cluster] rgmanager stop just hangs, clurgmgrd never terminates In-Reply-To: <49D4F7FE.5060906@lozano.eti.br> References: <6708F96BBF31F846BFA56EC0AE37D62281A6E3C38F@CSUN-EX-V01.csun.edu> <49D4F7FE.5060906@lozano.eti.br> Message-ID: <6708F96BBF31F846BFA56EC0AE37D62281A6E3C390@CSUN-EX-V01.csun.edu> Yup, matter of fact, I disabled iptables altogether. The cluster comes up fine and I have services running once again (this is a test setup btw). Just to let you know I managed to get the cluster in this state when I was doing some failover testing. I'm just wondering why when I do a /sbin/service rgmanager {stop|restart} it hangs indefinitely. Btw, a question about that clean_start directive. I'm reading the fenced man page and will the value of "1" prevent a fencing loop at startup. I've seen it where I bring up 1 node, and then bring up node 2 and node 2 fences node1 and I see this in the log: Apr 1 22:47:14 oilfish openais[4643]: [CPG ] got joinlist message from node 1 Apr 1 22:47:14 oilfish openais[4643]: [CPG ] got joinlist message from node 2 Apr 1 22:47:15 oilfish openais[4643]: [CMAN ] cman killed by node 2 because we rejoined the cluster without a full restart Arwin -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Fernando Lozano Sent: Thursday, April 02, 2009 10:38 AM To: linux clustering Subject: Re: [Linux-cluster] rgmanager stop just hangs, clurgmgrd never terminates Hi Arwin, I have the same problem on a two-node cluster (two KVM vitual machines) and on another two-node cluster with real Dell servers. If I flush iptables rules BEFORE starting cman, everything works fine. But if I start cman and rgmanager with iptables rules, I see no services and rgmanager hangs. Flusing iptables rules after starting cman changes anything. :-( I have all ports open as stated by RHCS manual, but it wasn't enough. I still cannot find why rgmanager hangs and which rules my iptables setup is missing, but I have the same behaviour on another setup with two VMware virtual machines. I don't use qdisk, clvmd nor gfs. My clustert setup has clean_start="1" on fenced. I'm on RHEL5.2, tried both 32 and 64-bits. Have you tried starting your cluster with no firewall? []s, Fernando Lozano > Hey all, > > > > I ran into an issue where my cluster was quorate but none of the > services were showing up via the clustat command. When I tried to do > a /sbin/service rgmanager stop, it hangs indefinitely. The sigterm is > sent but the clurgmgrd processes don?t stop. What I ended up doing > was manually kill off clurgmgrd, remove the pid file from /var/run/, > restart cman and ultimately had to restart clvmd. I?m on RHEL5U3 > (x86_64), 2 node with a qdisk. I?m also having this same rgmanager > hang on RHEL5U2 (x86_64) 3 node. Am I doing something wrong here? > > > > Thanks, > > Arwin > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From sdake at redhat.com Thu Apr 2 18:59:15 2009 From: sdake at redhat.com (Steven Dake) Date: Thu, 02 Apr 2009 11:59:15 -0700 Subject: [Linux-cluster] Trouble after Openais upgrade to 0.80.3-22.el5 In-Reply-To: <593E210EDC38444DA1C17E9E9F5E264B98FC9A@GBMDMail01.genband.com> References: <20090330180739.GB6135@redhat.com> <593E210EDC38444DA1C17E9E9F5E264B98FC9A@GBMDMail01.genband.com> Message-ID: <1238698755.4602.17.camel@sdake-laptop> Likely you ran into the segfault that happens during the upgrade process from some 5.2 to 5.3 nodes. You can reboot your cluster with all either 5.2 or alternatively 5.3 nodes or wait until the 5.3.z stream release becomes available which resolves this problem. regards -steve On Wed, 2009-04-01 at 23:51 -0400, Ed Sanborn wrote: > I have RHEL 5.2 > I tried upgrading openais on a few of my nodes. Original version was > 0.80.3-15.el5 and I upgraded to version 0.80.3-22.el5. > Now the node will not connect to the cluster. I get the following error > in > /var/log/messages: > > "unable to connect to cluster infrastructure" > > Has anyone else run into this issue? Is there a way around this besides > going back to > the old version? > > Ed > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From kadlec at mail.kfki.hu Thu Apr 2 19:29:45 2009 From: kadlec at mail.kfki.hu (Kadlecsik Jozsef) Date: Thu, 2 Apr 2009 21:29:45 +0200 (CEST) Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: <49D4CDBF.6040203@gmail.com> References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> <49D4CDBF.6040203@gmail.com> Message-ID: Hi, On Thu, 2 Apr 2009, Wendy Cheng wrote: > > If you have any idea what to do next, please write it. > > > Do you have your kernel source somewhere (in tar ball format) so people can > look into it ? I have created the tarballs, you can find them at http://www.kfki.hu/~kadlec/gfs/: - Kernel is vanilla 2.6.27.21, the '.config' file is preserved in the tarball as 'config'. - On top of that I installed vanilla e1000-8.0.6, e1000e-0.5.8.2 and aoe6-69. The same e1000-8.0.6 and e1000e-0.5.8.2 are used with the working cluster-2.01.00 to which the earlier aoe6-59 was added. - The cluster-2.03.11 is also the vanilla version, except that since this thread started I have added two small corrections: - fence/fenced/agent.c fixed, see https://www.redhat.com/archives/linux-cluster/2009-March/msg00222.html - gfs2/mount/umount.gfs2.c, '-l' flag support added, see https://www.redhat.com/archives/cluster-devel/2009-April/msg00000.html The configure options are in 'configure-options', the locally used init scripts can be found under deb/DEBIAN/etc ;-) The GFS volumes are mounted with noatime, quota is enabled (can't leave that off). The volumes are tuned with the values: statfs_slots 128 statfs_fast 1 demote_secs 30 glock_purge 50 scand_secs 3 Those are mostly remnants of the time when Maildir was in use instead of plain mailbox format and we tried to cure the terrible performance. Probably it's worth to note that 'statfs_fast 1' takes a lot of time to complete (usually around 15-20 seconds) which is, at least for me, surprising. I think that's all. Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From kadlec at mail.kfki.hu Thu Apr 2 21:09:45 2009 From: kadlec at mail.kfki.hu (Kadlecsik Jozsef) Date: Thu, 2 Apr 2009 23:09:45 +0200 (CEST) Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> Message-ID: On Thu, 2 Apr 2009, Kadlecsik Jozsef wrote: > If you have any idea what to do next, please write it. Spent again some time looking through the git commits and that triggered some wild guessing: - commit ddebb0c3dc7d0b87c402ba17731ad41abdd43f2d ? It is a temporary fix for 2.6.26, which is additionally based on a kludge and I'm trying 2.6.27/28. Might be not appropriate anymore for these kernels? - commit d9c3e59e90437567d063144bcfdbbc9fe6e8d615 ? (and other noatime related commits) Noatime handling and I do use noatime. Hm, I could try to start mailman without noatime and we'll see what happens. - commit ff7d89bfe60ed041d9342c8c9d91815c1f3d3bef ? gfs1-specific lock module, a huge patch. I could restore the gfs2_*lock* functions and check whether it helps. - commit 82d176ba485f2ef049fd303b9e41868667cebbdb gfs_drop_inode as .drop_inode replacing .put_inode. .put_inode was called without holding a lock, but .drop_inode is called under inode_lock held. Might it be a problem? What do you think? Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From s.wendy.cheng at gmail.com Thu Apr 2 21:37:19 2009 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 02 Apr 2009 16:37:19 -0500 Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> Message-ID: <49D5300F.9010805@gmail.com> Kadlecsik Jozsef wrote: > - commit 82d176ba485f2ef049fd303b9e41868667cebbdb > gfs_drop_inode as .drop_inode replacing .put_inode. > .put_inode was called without holding a lock, but .drop_inode > is called under inode_lock held. Might it be a problem? > > I was planning to take a look over the weekend .. but this one looks very promising. Give it a try and let us know ! -- Wendy From kadlec at mail.kfki.hu Thu Apr 2 21:45:32 2009 From: kadlec at mail.kfki.hu (Kadlecsik Jozsef) Date: Thu, 2 Apr 2009 23:45:32 +0200 (CEST) Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: <49D5300F.9010805@gmail.com> References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> <49D5300F.9010805@gmail.com> Message-ID: On Thu, 2 Apr 2009, Wendy Cheng wrote: > Kadlecsik Jozsef wrote: > > - commit 82d176ba485f2ef049fd303b9e41868667cebbdb > > gfs_drop_inode as .drop_inode replacing .put_inode. > > .put_inode was called without holding a lock, but .drop_inode > > is called under inode_lock held. Might it be a problem? > > > I was planning to take a look over the weekend .. but this one looks very > promising. Give it a try and let us know ! But - how? .put_inode was eliminated, cannot be used anymore in recent kernels. And I have no idea what should be changed in gfs_drop_inode. Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From s.wendy.cheng at gmail.com Thu Apr 2 22:07:41 2009 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 02 Apr 2009 17:07:41 -0500 Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> <49D5300F.9010805@gmail.com> Message-ID: <49D5372D.9090709@gmail.com> Kadlecsik Jozsef wrote: > On Thu, 2 Apr 2009, Wendy Cheng wrote: > > >> Kadlecsik Jozsef wrote: >> >>> - commit 82d176ba485f2ef049fd303b9e41868667cebbdb >>> gfs_drop_inode as .drop_inode replacing .put_inode. >>> .put_inode was called without holding a lock, but .drop_inode >>> is called under inode_lock held. Might it be a problem? >>> >>> >> I was planning to take a look over the weekend .. but this one looks very >> promising. Give it a try and let us know ! >> > > But - how? .put_inode was eliminated, cannot be used anymore in recent > kernels. And I have no idea what should be changed in gfs_drop_inode. > > I see :) ... let me move your tar ball over. Know about cluster IRC (check cluster wiki for instruction if you don't know how) ? Go there - maybe some IRC folks will be able to work this with you. -- Wendy From kbphillips80 at gmail.com Thu Apr 2 22:23:51 2009 From: kbphillips80 at gmail.com (Kaerka Phillips) Date: Thu, 2 Apr 2009 18:23:51 -0400 Subject: [Linux-cluster] RHEL5.3 Cluster - backup fencing methods Message-ID: Hi - I've got an issue with a 4-node cluster, and I'm hoping to get some good advice or best-practices for this. The 4-node cluster is on dell hardware, using DRAC cards as the primary fencing device, but I'd like to eliminate the single-point of failure introduced with the cabling for this method. I attempted to use Fence_ipmilan, but once i got the fence_drac5 working, this no longer works for unknown reasons, but even if it did work, the DRACs are on a private VLAN, as is the cluster and cluster multicast address. I'm concerned that a failure of the switch which hosts that vlan and drac ethernet connections would cause an outright cluster failure. The point of this cluster is to share GFS2 filesystems amongst the 4-nodes. My network setup is this: 2x GigE cables plugged into PCI-E cards (two physical cards), bonded ethernet config, public network 1x GigE cable plugged into port #1 on dell server, in "shared" mode in DRAC5 card, on private network: DRAC5 card maps to this with an assigned ip address, OS also maps to this with a different assigned ip address. All cluster communication between nodes passes over private network over GigE shared port. I've not been able to determine the correct solution to eliminate this last single-point of failure, aside from adding an additional network connection to another on-board ethernet card, and mapping this to the private network, to a 2nd switch. I have no guarantee that this will work, nor much documentation to indicate what setup would be required for this. Any thoughts? Thanks, Kaerka -------------- next part -------------- An HTML attachment was scrubbed... URL: From erickson.jon at gmail.com Fri Apr 3 00:53:34 2009 From: erickson.jon at gmail.com (Jon Erickson) Date: Thu, 2 Apr 2009 20:53:34 -0400 Subject: [Linux-cluster] Unable to mount GFS File System in RHEL5.2 (32bit) In-Reply-To: <56bb44d0904020811u118479fdgbd6f0b004581f095@mail.gmail.com> References: <56bb44d0904020811u118479fdgbd6f0b004581f095@mail.gmail.com> Message-ID: <6a90e4da0904021753o16a9d996rcf8e3cb30035b796@mail.gmail.com> I'm having the same problem... When running the mount command with the '-v' option it says something about errno 19? I don't remember exactly, I can post more info tomorrow. On Thu, Apr 2, 2009 at 11:11 AM, Stevan Colaco wrote: > Dear All, > > I have setup 2 node cluster on RHEL5.2 (32bit) + Quorum Partition + > GFS Partition. > i could make gfs file system but issues while trying to mount it. is > it due to GFS module not loaded? > unable to load GFS module. > > below are the details, anyone has faced this issue before, please > suggest........ > > [root at quod-core1-uat ~]# mount -t gfs /dev/quoduat/rv /rv > /sbin/mount.gfs: error mounting /dev/mapper/quoduat-rv on /rv: No such device > [root at quod-core1-uat ~]# > > GFS rpms are installed > [root at quod-core1-uat ~]# rpm -qa | grep -i gfs > kmod-gfs2-1.92-1.1.el5 > kmod-gfs-0.1.23-5.el5 > gfs-utils-0.1.17-1.el5 > gfs2-utils-0.1.44-1.el5 > [root at quod-core1-uat ~]# > > couldn't find the gfs module loaded > [root at quod-core1-uat ~]# lsmod | grep -i gfs > gfs2 ? ? ? ? ? ? ? ? ?346344 ?1 lock_dlm > configfs ? ? ? ? ? ? ? 28753 ?2 dlm > [root at quod-core1-uat ~]# > > modinfo gfs throws below error:- > [root at quod-core1-uat ~]# modinfo gfs > modinfo: could not find module gfs > [root at quod-core1-uat ~]# > > manually locating module, list the module > [root at quod-core1-uat ~]# modinfo /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko > filename: ? ? ? /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko > license: ? ? ? ?GPL > author: ? ? ? ? Red Hat, Inc. > description: ? ?Global File System 0.1.23-5.el5 > srcversion: ? ? F36BE93709E650F2BEC45A5 > depends: ? ? ? ?gfs2 > vermagic: ? ? ? 2.6.18-92.el5 SMP mod_unload 686 REGPARM 4KSTACKS gcc-4.1 > [root at quod-core1-uat ~]# > > unable to install the module gfs > [root at quod-core1-uat ~]# modprobe /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko > FATAL: Module /lib/modules/2.6.18_92.el5/extra/gfs/gfs.ko not found. > [root at quod-core1-uat ~]# > > [root at quod-core1-uat ~]# clustat > Cluster Status for quod-clust-uat @ Thu Apr ?2 18:09:45 2009 > Member Status: Quorate > > ?Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status > ?------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------ > ?quod-core2-uat.kmefic.com.kw ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 > Online, rgmanager > ?quod-core1-uat.kmefic.com.kw ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?2 > Online, Local, rgmanager > ?/dev/sdc1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 > Online, Quorum Disk > > ?Service Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Owner (Last) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?State > ?------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----- ------ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?----- > ?service:quod-uat-ip > quod-core1-uat.kmefic.com.kw ? ? ? ? ? ? ? ? ? ? ? ? ? ? started > [root at quod-core1-uat ~]# > > Thanks in Advance, > -Stevan Colaco > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Jon From fernando at lozano.eti.br Fri Apr 3 01:07:10 2009 From: fernando at lozano.eti.br (Fernando Lozano) Date: Thu, 02 Apr 2009 22:07:10 -0300 Subject: [Linux-cluster] rgmanager stop just hangs, clurgmgrd never terminates In-Reply-To: <6708F96BBF31F846BFA56EC0AE37D62281A6E3C390@CSUN-EX-V01.csun.edu> References: <6708F96BBF31F846BFA56EC0AE37D62281A6E3C38F@CSUN-EX-V01.csun.edu> <49D4F7FE.5060906@lozano.eti.br> <6708F96BBF31F846BFA56EC0AE37D62281A6E3C390@CSUN-EX-V01.csun.edu> Message-ID: <49D5613E.1050403@lozano.eti.br> Arwin, Doesn't you log shows one node trying to fence the other? Clean_start prevents that at cluster startup, but on failover the survivor wants to fence the other. You may need to use fence_ack to let one node belive the other was fenced if you do not have a real fence device, for example Dell DRAC or a Network APS. []s, Fernando Lozano > Yup, matter of fact, I disabled iptables altogether. The cluster comes up fine and I have services running once again (this is a test setup btw). Just to let you know I managed to get the cluster in this state when I was doing some failover testing. I'm just wondering why when I do a /sbin/service rgmanager {stop|restart} it hangs indefinitely. > > Btw, a question about that clean_start directive. I'm reading the fenced man page and will the value of "1" prevent a fencing loop at startup. I've seen it where I bring up 1 node, and then bring up node 2 and node 2 fences node1 and I see this in the log: > []s, Fernando Lozano From kbphillips80 at gmail.com Fri Apr 3 01:44:49 2009 From: kbphillips80 at gmail.com (Kaerka Phillips) Date: Thu, 2 Apr 2009 21:44:49 -0400 Subject: [Linux-cluster] Unable to mount GFS File System in RHEL5.2 (32bit) In-Reply-To: <6a90e4da0904021753o16a9d996rcf8e3cb30035b796@mail.gmail.com> References: <56bb44d0904020811u118479fdgbd6f0b004581f095@mail.gmail.com> <6a90e4da0904021753o16a9d996rcf8e3cb30035b796@mail.gmail.com> Message-ID: It looks like there is a mix between the gfs and gfs2 filesystem and modules on your system -- your loaded module is GFS2, so perhaps try mounting with "-t gfs2", except that you will need to have made the filesystem with GFS2 as well. All of my mounted GFS2 filesystems show "gfs2" as the FS type currently mounted. You may want to remove all of the GFS items, and leave the GFS2 components (since that is the supported FS) On a RHEL5.3 system with only GFS2: # lsmod |grep gfs gfs2 524204 12 lock_dlm On a RHEL5.3 system with GFS2 only (64bit system): # modinfo gfs2 filename: /lib/modules/2.6.18-128.1.1.el5/weak-updates/gfs2/gfs2.ko license: GPL author: Red Hat, Inc. description: Global File System srcversion: 3E318153BB4A45EAE38B903 depends: vermagic: 2.6.18-92.el5 SMP mod_unload gcc-4.1 parm: scand_secs:The number of seconds between scand runs (uint) On a RHEL5.2 system with GFS2: # modinfo gfs2 filename: /lib/modules/2.6.18-92.1.13.el5PAE/kernel/fs/gfs2/gfs2.ko license: GPL author: Red Hat, Inc. description: Global File System srcversion: B09BC266DD032D7FCEA51E5 depends: vermagic: 2.6.18-92.1.13.el5PAE SMP mod_unload 686 REGPARM 4KSTACKS gcc-4.1 parm: scand_secs:The number of seconds between scand runs (uint) module_sig: 883e35048bf9999c45df68fce924fd711286f509771f47a50c8a08af053af2d178c427da1e6788409f6ae5853585f2f14ddf7f78d9fb259eac8236bd9 On Thu, Apr 2, 2009 at 8:53 PM, Jon Erickson wrote: > I'm having the same problem... > > When running the mount command with the '-v' option it says something > about errno 19? I don't remember exactly, I can post more info > tomorrow. > > > On Thu, Apr 2, 2009 at 11:11 AM, Stevan Colaco > wrote: > > Dear All, > > > > I have setup 2 node cluster on RHEL5.2 (32bit) + Quorum Partition + > > GFS Partition. > > i could make gfs file system but issues while trying to mount it. is > > it due to GFS module not loaded? > > unable to load GFS module. > > > > below are the details, anyone has faced this issue before, please > > suggest........ > > > > [root at quod-core1-uat ~]# mount -t gfs /dev/quoduat/rv /rv > > /sbin/mount.gfs: error mounting /dev/mapper/quoduat-rv on /rv: No such > device > > [root at quod-core1-uat ~]# > > > > GFS rpms are installed > > [root at quod-core1-uat ~]# rpm -qa | grep -i gfs > > kmod-gfs2-1.92-1.1.el5 > > kmod-gfs-0.1.23-5.el5 > > gfs-utils-0.1.17-1.el5 > > gfs2-utils-0.1.44-1.el5 > > [root at quod-core1-uat ~]# > > > > couldn't find the gfs module loaded > > [root at quod-core1-uat ~]# lsmod | grep -i gfs > > gfs2 346344 1 lock_dlm > > configfs 28753 2 dlm > > [root at quod-core1-uat ~]# > > > > modinfo gfs throws below error:- > > [root at quod-core1-uat ~]# modinfo gfs > > modinfo: could not find module gfs > > [root at quod-core1-uat ~]# > > > > manually locating module, list the module > > [root at quod-core1-uat ~]# modinfo > /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko > > filename: /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko > > license: GPL > > author: Red Hat, Inc. > > description: Global File System 0.1.23-5.el5 > > srcversion: F36BE93709E650F2BEC45A5 > > depends: gfs2 > > vermagic: 2.6.18-92.el5 SMP mod_unload 686 REGPARM 4KSTACKS gcc-4.1 > > [root at quod-core1-uat ~]# > > > > unable to install the module gfs > > [root at quod-core1-uat ~]# modprobe > /lib/modules/2.6.18-92.el5/extra/gfs/gfs.ko > > FATAL: Module /lib/modules/2.6.18_92.el5/extra/gfs/gfs.ko not found. > > [root at quod-core1-uat ~]# > > > > [root at quod-core1-uat ~]# clustat > > Cluster Status for quod-clust-uat @ Thu Apr 2 18:09:45 2009 > > Member Status: Quorate > > > > Member Name ID > Status > > ------ ---- ---- > ------ > > quod-core2-uat.kmefic.com.kw 1 > > Online, rgmanager > > quod-core1-uat.kmefic.com.kw 2 > > Online, Local, rgmanager > > /dev/sdc1 0 > > Online, Quorum Disk > > > > Service Name Owner (Last) > > State > > ------- ---- ----- ------ > > ----- > > service:quod-uat-ip > > quod-core1-uat.kmefic.com.kw started > > [root at quod-core1-uat ~]# > > > > Thanks in Advance, > > -Stevan Colaco > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Jon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrugeshkarnik at gmail.com Fri Apr 3 03:09:07 2009 From: mrugeshkarnik at gmail.com (Mrugesh Karnik) Date: Fri, 3 Apr 2009 08:39:07 +0530 Subject: [Linux-cluster] Network Interface Binding for cman In-Reply-To: <64D0546C5EBBD147B75DE133D798665F02FDB6D8@hugo.eprize.local> References: <200904022103.38273.mrugeshkarnik@gmail.com> <64D0546C5EBBD147B75DE133D798665F02FDB6D8@hugo.eprize.local> Message-ID: <200904030839.08056.mrugeshkarnik@gmail.com> On Thursday 02 Apr 2009 21:34:12 Jeff Sturm wrote: > It binds to a multicast address. That address is bound to one interface > normally. Well, how do I specify which interface to bind that multicast address to? I see the `bindnetaddr' directive in openais.conf man page. The cman man page tells me that the parameter from the section will overwrite. Now, I haven't been able to find any reference as to the syntax of the clusternodes directive. Also, according to the openais.conf man page, the bindnetaddr directive is a subdirective of the interface directive, which in itself a subdirective of the totem directive. So I'm wondering if it goes something like what follows: > If you need two interfaces, look into ethernet bonding. Can't, in this setup. Though, is it not at all possible to use multiple multicast addresses to bind to on different interfaces? Heartbeat, for instance, allows it. Thanks, Mrugesh From mrugeshkarnik at gmail.com Fri Apr 3 03:32:17 2009 From: mrugeshkarnik at gmail.com (Mrugesh Karnik) Date: Fri, 3 Apr 2009 09:02:17 +0530 Subject: [Linux-cluster] Network Interface Binding for cman In-Reply-To: <200904030839.08056.mrugeshkarnik@gmail.com> References: <200904022103.38273.mrugeshkarnik@gmail.com> <64D0546C5EBBD147B75DE133D798665F02FDB6D8@hugo.eprize.local> <200904030839.08056.mrugeshkarnik@gmail.com> Message-ID: <200904030902.17952.mrugeshkarnik@gmail.com> On Friday 03 Apr 2009 08:39:07 Mrugesh Karnik wrote: > On Thursday 02 Apr 2009 21:34:12 Jeff Sturm wrote: > > It binds to a multicast address. That address is bound to one interface > > normally. > > Well, how do I specify which interface to bind that multicast address to? I > see the `bindnetaddr' directive in openais.conf man page. The cman man page > tells me that the parameter from the section will overwrite. > Now, I haven't been able to find any reference as to the syntax of the > clusternodes directive. > > Also, according to the openais.conf man page, the bindnetaddr directive is > a subdirective of the interface directive, which in itself a subdirective > of the totem directive. So I'm wondering if it goes something like what > follows: > > > > > > > > I guess this is what I was looking for: http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html I'm reading through the wiki now. Mrugesh From s.wendy.cheng at gmail.com Fri Apr 3 03:59:52 2009 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 02 Apr 2009 22:59:52 -0500 Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> <49D5300F.9010805@gmail.com> Message-ID: <49D589B8.1050702@gmail.com> >> Kadlecsik Jozsef wrote: >> >>> - commit 82d176ba485f2ef049fd303b9e41868667cebbdb >>> gfs_drop_inode as .drop_inode replacing .put_inode. >>> .put_inode was called without holding a lock, but .drop_inode >>> is called under inode_lock held. Might it be a problem >>> Based on code reading ... 1. iput() gets inode_lock (a spin lock) 2. iput() calls iput_final() 3. iput_final() calls filesystem drop_inode(), followed by generic_drop_inode() 4. generic_drop_inode() unlock inode_lock after doing all sorts of fun things with the inode So look to me that generic_drop_inode() statement within gfs_drop_inode() should be removed. Otherwise you would get double unlock and double list free. In short, *remove* line #73 from gfs-kernel/src/gfs/ops_super.c in your source and let us know how it goes. -- Wendy From kadlec at mail.kfki.hu Fri Apr 3 06:38:12 2009 From: kadlec at mail.kfki.hu (Kadlecsik Jozsef) Date: Fri, 3 Apr 2009 08:38:12 +0200 (CEST) Subject: [Linux-cluster] Freeze with cluster-2.03.11 In-Reply-To: <49D589B8.1050702@gmail.com> References: <1404804625.1710261238184677530.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> <1a2a6dd60903272036k7bedef6ft718cf74331f562bc@mail.gmail.com> <49CE4B36.2000103@gmail.com> <49D03447.2000901@gmail.com> <49D0EC83.4030202@gmail.com> <49D11334.5030406@redhat.com> <49D5300F.9010805@gmail.com> <49D589B8.1050702@gmail.com> Message-ID: On Thu, 2 Apr 2009, Wendy Cheng wrote: > > > Kadlecsik Jozsef wrote: > > > > > > > - commit 82d176ba485f2ef049fd303b9e41868667cebbdb > > > > gfs_drop_inode as .drop_inode replacing .put_inode. > > > > .put_inode was called without holding a lock, but .drop_inode > > > > is called under inode_lock held. Might it be a problem > > > > > Based on code reading ... > 1. iput() gets inode_lock (a spin lock) > 2. iput() calls iput_final() > 3. iput_final() calls filesystem drop_inode(), followed by > generic_drop_inode() > 4. generic_drop_inode() unlock inode_lock after doing all sorts of fun things > with the inode > > So look to me that generic_drop_inode() statement within > gfs_drop_inode() should be removed. Otherwise you would get double > unlock and double list free. I think those function calls are right: iput_final calls either the filesystem drop_inode function (in this case gfs_drop_inode) or generic_drop_inode. There's no double call of generic_drop_inode. However gfs_sync_page_i (and in turn filemap_fdatawrite and filemap_fdatawait) is now called under inode_lock held and that was not so in previous versions. But I'm just speculating. > In short, *remove* line #73 from gfs-kernel/src/gfs/ops_super.c in your > source and let us know how it goes. I won't get a chance to start a test before Monday, sorry. Best regards, Jozsef -- E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt Address: KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From reggaestar at gmail.com Fri Apr 3 09:16:41 2009 From: reggaestar at gmail.com (remi doubi) Date: Fri, 3 Apr 2009 09:16:41 +0000 Subject: [Linux-cluster] Virtualization on top of Centos cluster Message-ID: <3c88c73a0904030216k351345b7pad2528ec28e9dd38@mail.gmail.com> Hi everyone, i apologize for my bad english. i'm a familiar with Linux environement ( fedora 10 user ) and i got a project in a training where i have to create a cluster with two nodes where i have to set up a number of VMs that will run applications such as ( Samba, Ldap, Zimbra, ...) but i don't know how to virtualize on top of a cluster !! i would like to know how that can be done, and how is it possible to let the VMs get ressources ( RAM & CPU ) from the two nodes ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From binder.christian at gmx.de Fri Apr 3 09:42:38 2009 From: binder.christian at gmx.de (Christian Binder) Date: Fri, 03 Apr 2009 11:42:38 +0200 Subject: [Linux-cluster] resetting a fence - Device (ILOM) during a running Cluster (RHEL 5.2) Message-ID: <20090403094238.320830@gmx.net> Hello, we are comfortable with our 2-Node RHEL-5.2 - Cluster (some Oracle-DBs), we have subscribed for several months and which is running stable. We use two SunXFires 4200 M2 server and do the fencing with the integrated ILOMs. Unfortunatly, - because of errors on the ILOM the ILOM of the one node has to be reset, which is the advice of our hardware-vendor. The reset of the ILOM can be done in production (tested on a single machine) transparent for the OS on this machine (means: without the need of rebooting the OS of the machine.) The only thing, I noticed during the test on the single - Server is, that the ILOM is down (network not available) for about 30 sec. Does this short downtime of the fencedevice affect the Redhat Clustersoftware or can we do that action without problems in procution -time ? Thank you for your answer. Christian -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a From kirbyzhou at sohu-rd.com Fri Apr 3 10:00:56 2009 From: kirbyzhou at sohu-rd.com (Kirby Zhou) Date: Fri, 3 Apr 2009 18:00:56 +0800 Subject: [Linux-cluster] How to recover from gnbd "Open Disconnected Pending" state? In-Reply-To: <07bc01c9b35f$b5a675a0$20f360e0$@com> References: <07bc01c9b35f$b5a675a0$20f360e0$@com> Message-ID: <0c3701c9b443$14cede80$3e6c9b80$@com> How to recover from gnbd "Open Disconnected Pending" state? When the machine which exported gnbd is broken or shutdown, the client machine would fall into the state 'Open Disconnected Pending'. Any process accessing the dead gnbd will fall into state 'D'. How can I remove the dead gnbd block device on the client machine? #On the client machine [root at xen-727057 ~]# gnbd_import -n -l Device name : 63.131.xvdb ---------------------- Minor # : 0 sysfs name : /block/gnbd0 Server : 10.10.63.131 Port : 14567 State : Open Disconnected Pending Readonly : No Sectors : 16777216 [root at xen-727057 ~]# pvs & [1] 4561 [root at xen-727057 ~]# ps aux | fgrep pvs root 4561 0.6 0.1 79364 1544 pts/0 D 14:50 0:00 pvs root 4563 0.0 0.0 61112 608 pts/0 S+ 14:50 0:00 fgrep pvs [root at xen-727057 ~]# gnbd_import -n -R gnbd_import: ERROR cannot disconnect device #0 : Device or resource busy From jdong at redhat.com Fri Apr 3 10:18:38 2009 From: jdong at redhat.com (jdong) Date: Fri, 03 Apr 2009 18:18:38 +0800 Subject: [Linux-cluster] Virtualization on top of Centos cluster In-Reply-To: <3c88c73a0904030216k351345b7pad2528ec28e9dd38@mail.gmail.com> References: <3c88c73a0904030216k351345b7pad2528ec28e9dd38@mail.gmail.com> Message-ID: <49D5E27E.1040803@redhat.com> Hey remi, When you create VMs, you can assign hardware sources to VMs. Did you send the mail to ask how to create vm? If you use fedora 10,you can use qemu tool to add kvm.There is a GUI named virt-manager.It has bugs about creating kvm before.You also can use qemu-img to create image file and use qemu-kvm to assign resource. You can get details from man page.After setting up VMs,you can login them to install applications. remi doubi wrote: > Hi everyone, > i apologize for my bad english. > i'm a familiar with Linux environement ( fedora 10 user ) and i got a > project in a training where i have to create a cluster with two nodes > where i have to set up a number of VMs that will run applications such > as ( Samba, Ldap, Zimbra, ...) > but i don't know how to virtualize on top of a cluster !! > i would like to know how that can be done, and how is it possible to > let the VMs get ressources ( RAM & CPU ) from the two nodes ?? > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From reggaestar at gmail.com Fri Apr 3 10:27:46 2009 From: reggaestar at gmail.com (remi doubi) Date: Fri, 3 Apr 2009 10:27:46 +0000 Subject: Fwd: [Linux-cluster] Virtualization on top of Centos cluster In-Reply-To: <49D5E27E.1040803@redhat.com> References: <3c88c73a0904030216k351345b7pad2528ec28e9dd38@mail.gmail.com> <49D5E27E.1040803@redhat.com> Message-ID: <3c88c73a0904030327g15876027g231105c6b860331e@mail.gmail.com> When you create VMs, you can assign hardware sources to VMs. but how can i do that, do i have to assign hadware sources manually ( which when i want to create a VM, i have to set for example the memory from the first server and the CPU from the second or the opposite) or do i have just to specify that the VM would have an amout of memory and CPU and the cluster will choose from which servers the sources will be taken ??? i will probably choose Xen instead of Qemu and the servers OS will be Centos because they told me that i have ti use GFS for shared storage. what do you think about it ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From neuroticimbecile at yahoo.com Fri Apr 3 10:51:40 2009 From: neuroticimbecile at yahoo.com (eric rosel) Date: Fri, 3 Apr 2009 03:51:40 -0700 (PDT) Subject: [Linux-cluster] Virtualization on top of Centos cluster In-Reply-To: <3c88c73a0904030216k351345b7pad2528ec28e9dd38@mail.gmail.com> Message-ID: <310778.21764.qm@web53209.mail.re2.yahoo.com> Hi Remi, I was able to configure such a beast last year. I used OpenVZ for virtualization, and a separate iSCSI SAN storage server (which was also a RHCS cluster) for the /vz directory where most of the OpenVZ stuff are stored. Configuring it was not very complicated, I basically just had to add the resources (the openvz startup script, the iscsi device, etc.) through luci. I used CentOS 5.2 for that project, but it didn't involve getting RAM and CPU resources from the standby node. HTH, -eric --- On Fri, 4/3/09, remi doubi wrote: > From: remi doubi > Subject: [Linux-cluster] Virtualization on top of Centos cluster > To: linux-cluster at redhat.com > Date: Friday, April 3, 2009, 5:16 PM > Hi everyone, > i apologize for my bad english. > i'm a familiar with Linux environement ( fedora 10 > user ) and i got a > project in a training where i have to create a cluster with > two nodes where > i have to set up a number of VMs that will run applications > such as ( Samba, > Ldap, Zimbra, ...) > but i don't know how to virtualize on top of a cluster > !! > i would like to know how that can be done, and how is it > possible to let the > VMs get ressources ( RAM & CPU ) from the two nodes ?? > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From reggaestar at gmail.com Fri Apr 3 11:03:48 2009 From: reggaestar at gmail.com (remi doubi) Date: Fri, 3 Apr 2009 11:03:48 +0000 Subject: [Linux-cluster] Virtualization on top of Centos cluster In-Reply-To: <310778.21764.qm@web53209.mail.re2.yahoo.com> References: <3c88c73a0904030216k351345b7pad2528ec28e9dd38@mail.gmail.com> <310778.21764.qm@web53209.mail.re2.yahoo.com> Message-ID: <3c88c73a0904030403q486b836ej34f5265568f25261@mail.gmail.com> Thakns eric & jdong that's what i tought, there will be a lot of bugs due to sources synchronisation for sure. But i read yesterday in a topic that a guy did that with Xen, he assigned the cluster as Dom0, and then all VMs as DomU. and probably all the hadware sources "Cluster" will be shared. is this can work ?? Remi -------------- next part -------------- An HTML attachment was scrubbed... URL: From hlawatschek at atix.de Fri Apr 3 11:11:00 2009 From: hlawatschek at atix.de (Mark Hlawatschek) Date: Fri, 3 Apr 2009 13:11:00 +0200 Subject: [Linux-cluster] resetting a fence - Device (ILOM) during a running Cluster (RHEL 5.2) In-Reply-To: <20090403094238.320830@gmx.net> References: <20090403094238.320830@gmx.net> Message-ID: <200904031311.00393.hlawatschek@atix.de> > we are comfortable with our 2-Node RHEL-5.2 - Cluster (some Oracle-DBs), > we have subscribed for several months and which is running stable. > > We use two SunXFires 4200 M2 server and do the fencing with the integrated > ILOMs. Unfortunatly, - because of errors on the ILOM the ILOM of the one > node has to be reset, which is the advice of our hardware-vendor. The reset > of the ILOM can be done in production (tested on a single machine) > transparent for the OS on this machine (means: without the need of > rebooting the OS of the machine.) The only thing, I noticed during the test > on the single - Server is, that the ILOM is down (network not available) > for about 30 sec. > > Does this short downtime of the fencedevice affect the Redhat > Clustersoftware or can we do that action without problems in procution > -time ? If there's no need for fencing the node during the 30secs period, it can be done during production time. -Mark From reggaestar at gmail.com Fri Apr 3 11:37:06 2009 From: reggaestar at gmail.com (remi doubi) Date: Fri, 3 Apr 2009 11:37:06 +0000 Subject: [Linux-cluster] linux clustering Message-ID: <3c88c73a0904030437o20924ff1n48c63926520a633@mail.gmail.com> here's the article : http://www.mail-archive.com/linux-cluster at redhat.com/msg05169.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From binder.christian at gmx.de Fri Apr 3 12:59:06 2009 From: binder.christian at gmx.de (Christian Binder) Date: Fri, 03 Apr 2009 14:59:06 +0200 Subject: [Linux-cluster] resetting a fence - Device (ILOM) during a running Cluster (RHEL 5.2) In-Reply-To: <200904031311.00393.hlawatschek@atix.de> References: <20090403094238.320830@gmx.net> <200904031311.00393.hlawatschek@atix.de> Message-ID: <20090403125906.262190@gmx.net> Thank you Mark, the reset was sucessful with no affect to the OS (PS: I gladly remember me on your solution-day in Neuss at February) Christian -------- Original-Nachricht -------- > Datum: Fri, 3 Apr 2009 13:11:00 +0200 > Von: Mark Hlawatschek > An: linux clustering > Betreff: Re: [Linux-cluster] resetting a fence - Device (ILOM) during a running Cluster (RHEL 5.2) > > we are comfortable with our 2-Node RHEL-5.2 - Cluster (some Oracle-DBs), > > we have subscribed for several months and which is running stable. > > > > We use two SunXFires 4200 M2 server and do the fencing with the > integrated > > ILOMs. Unfortunatly, - because of errors on the ILOM the ILOM of the > one > > node has to be reset, which is the advice of our hardware-vendor. The > reset > > of the ILOM can be done in production (tested on a single machine) > > transparent for the OS on this machine (means: without the need of > > rebooting the OS of the machine.) The only thing, I noticed during the > test > > on the single - Server is, that the ILOM is down (network not available) > > for about 30 sec. > > > > Does this short downtime of the fencedevice affect the Redhat > > Clustersoftware or can we do that action without problems in procution > > -time ? > If there's no need for fencing the node during the 30secs period, it can > be > done during production time. > > -Mark > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss f?r nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a From reggaestar at gmail.com Fri Apr 3 13:43:14 2009 From: reggaestar at gmail.com (remi doubi) Date: Fri, 3 Apr 2009 13:43:14 +0000 Subject: Fwd: [Linux-cluster] Virtualization on top of Centos cluster Message-ID: <3c88c73a0904030643r3cf07114wa66a922eb6b02d3d@mail.gmail.com> here's the article : http://www.mail-archive.com/linux-cluster at redhat.com/msg05169.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From burton at simondsfamily.com Fri Apr 3 13:44:35 2009 From: burton at simondsfamily.com (Burton Simonds) Date: Fri, 3 Apr 2009 09:44:35 -0400 Subject: [Linux-cluster] Service behavior when migration fails. Message-ID: <77f48c0d0904030644n3b87e83drfd0c75b7b2a645c3@mail.gmail.com> I have a 2 node cluster, and I would like for the following behavior to happen: Node 1 is running apache Node 2 is in standby, but has a bad apache config (I know it should be tested before going into production, but lets pretend I am a moron) Node 1's apache is killed. Tries to migrate to node 2, but fails Tries to migrage back to node 1 and succeeds. What is happening is when it tries to go back to node one, it says failed, and states the ip address is in use. I have the service set up as follows: