From pasik at iki.fi Wed Oct 1 13:26:41 2008 From: pasik at iki.fi (Pasi =?iso-8859-1?Q?K=E4rkk=E4inen?=) Date: Wed, 1 Oct 2008 16:26:41 +0300 Subject: [Linux-cluster] Online resizing CLVM PVs (with RHEL 5.3) Message-ID: <20081001132641.GO9714@edu.joroinen.fi> Hello list! I've been trying out RHEL 5.3 beta/test packages for kernel and device-mapper-multipath that allows online resizing of SCSI and dm-mpath devices. I've been able to successfully online resize: - iSCSI LUNs - dm-mpath device that is on top of those iSCSI LUNs - LVM PV on top of that dm-mpath device - LVM volume from that PV/VG - And ext3 filesystem on top of that LVM volume Now I'm wondering about online resizing shared/clustered CLVM PV. Should it work just like that..? Make sure all servers in the cluster see SCSI and dm-mpath devices resized, and after that just run pvresize? Thanks! -- Pasi From edoardo.causarano at laitspa.it Wed Oct 1 13:49:05 2008 From: edoardo.causarano at laitspa.it (Edoardo Causarano) Date: Wed, 1 Oct 2008 15:49:05 +0200 Subject: [Linux-cluster] fence_scsi & shutdown race Message-ID: <1222868945.6506.11.camel@ecausarano-laptop> Hi I config'd a a 2node cluster with fence_scsi on the gfs device. Works great but as soon as I (cleanly) reboot a node (say node01) the other (node02) will fence it and the shutdown init scripts will hang when GFS on the reebooting node (node01) tries to withdraw from the cluster. I have to (manually) reset the machine (node01) and it will happily rejoin. What can I do to avoid this situationm, any tunables? ciao, e From s.wendy.cheng at gmail.com Wed Oct 1 15:20:49 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 01 Oct 2008 11:20:49 -0400 Subject: [Linux-cluster] Distributed Replicated GFS shared storage In-Reply-To: <48E29731.4010604@gmail.com> References: <48E29731.4010604@gmail.com> Message-ID: <48E39551.2000201@gmail.com> Jos? Miguel Parrella Romero wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Juliano Rodrigues escribi?, en fecha 30/09/08 10:58: > >> Hello, >> >> In order to design an HA project I need a solution to replicate one GFS >> shared storage to another "hot" (standby) GFS "mirror", in case of my >> primary shared storage permanently fail. >> >> Inside RHEL Advanced Platform there is any supported way to accomplish >> that? >> > > I believe the whole point of GFS is avoiding you to spend twice your > storage capacity just for the sake of storage distribution. It already > enables you to have a standby server which can go live through a > resource manager whenever you need it. > Look like the original subject (requirement) is to have redundant (HA) storage devices. GFS alone can't accomplish this since it only deals with server nodes - as soon as the shared storage unit is gone, the filesystem will be completely unusable. Depending on the hardware, redundant storages do not necessarily consume a great deal of storage capacity. Though GFS itself does spread the blocks allocation across the whole partition (to avoid write contention between multiple clustered nodes), the underneath hardware may do things differently. That is, GFS block numbers (and its block layout) do not necessarily resemble the real disk block numbers (and the physical layout). So, say if you have a 1TB GFS partition configured but it only gets half full, you may have extra 500GB space to spare if your SAN product allows this type of over-commit. If your storage vendor supports data de-duplication, the storage consumption can go down even further. -- Wendy > However, if you need to have two separate storage facilities which sync > in one way, DRBD is probably the easiest way to do so. Heartbeat can > manage DRBD resources at block- and filesystem-level easily, and other > resource managers can probably do so (though I haven't used them) > > HTH, > Jose > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkjilzEACgkQUWAsjQBcO4KhwQCeM0lxhXfCwxiAigfi+39pHGog > alwAn3UilZcaPU009vaoxVhXFV6J5KqY > =IVLO > -----END PGP SIGNATURE----- > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From angelo.compagnucci at gmail.com Wed Oct 1 15:39:34 2008 From: angelo.compagnucci at gmail.com (Angelo Compagnucci) Date: Wed, 1 Oct 2008 17:39:34 +0200 Subject: [Linux-cluster] CLVM clarification Message-ID: <777f2ade0810010839y77afc32bn5038bcfb4e3c1a6d@mail.gmail.com> Hi to all,This is my first post on this list. Thanks in advance for every answer. I've already read every guide in this matter, this is the list: Cluster_Administration.pdf Cluster_Logical_Volume_Manager.pdf Global_Network_Block_Device.pdf Cluster_Suite_Overview.pdf Global_File_System.pdf CLVM.pdf RedHatClusterAdminOverview.pdf The truth is that I've not clear a point about CLVM. Let's me make an example: In this example CLVM and the Cluster suite are fully running without problems. Let's pose the same configuration of cluster.conf and lvm.conf and the nodes of the cluster are joined and operatives. NODE1: pvcreate /dev/hda3 NODE2: pvcreate /dev/hda2 Let's pose that CLVM spans LVM metadata across the cluster, if I stroke the command: pvscan I should see /dev/sda2 and /dev/sda3 and then I can create a vg with vgcreate /dev/sda2 /dev/sda3 ... The question is: How LVM metadata sharing works? I have to use GNBD on the row partion to share a device between nodes? I can create a GFS over a spanned volume group? Are shareable only logical volumes? Thanks for your answers!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From sakect at gmail.com Wed Oct 1 15:45:53 2008 From: sakect at gmail.com (POWERBALL ONLINE) Date: Wed, 1 Oct 2008 22:45:53 +0700 Subject: [Linux-cluster] adding volume to cluster In-Reply-To: <14139e3a0809300933s5014e4fp2465554c0bd1255a@mail.gmail.com> References: <14139e3a0809291459y6d4beb36sf69feb14c18f4b72@mail.gmail.com> <14139e3a0809300933s5014e4fp2465554c0bd1255a@mail.gmail.com> Message-ID: Please give me the error when you create the LVM in /var/log/messages. Are you running clvmd service? 2008/9/30 Terry Davis > lvm2-2.02.32-4.el5 > lvm2-cluster-2.02.32-4.el5 > > I don't have any patches available for LVM. > > 2008/9/30 POWERBALL ONLINE > > Hello, >> >> Are you already update patch? >> Because it is teh LVM bug you can find it in redhat bugzila. >> >> Best Regards, >> >> Somsak >> >> 2008/9/30 Terry Davis >> >>> Hello, >>> >>> I am having a heck of a time getting a volume to show up in my cluster. >>> I have a feeling I am doing something wrong but this isn't the first one >>> I've added so I'm not sure where I got lucky before. Here is what I've done >>> thus far in my 2 node RHEL5 cluster: >>> >>> 1) Created my volume in my SAN and gave both nodes access to it >>> 2) on node A: created 4TB partition with parted and made a gpt label >>> 3) on node A: pvcreate /dev/sdc1 >>> 4) on node B: vgcreate vg_data01e /dev/sdc1 >>> 5) on both nodes: vgchange -a y >>> 6) on node A: lvcreate -n lv_data01e vg_data01e >>> >>> I get the error: >>> Error locking on node omadvnfs01b: Volume group for uuid not found: >>> p9SfIjriPtXY33G1Yi3YdojvAAAzmAuwlOLqhVzX8mqL6goiVmUAgQZLGcDnX324 >>> Error locking on node omadvnfs01a: Volume group for uuid not found: >>> p9SfIjriPtXY33G1Yi3YdojvAAAzmAuwlOLqhVzX8mqL6goiVmUAgQZLGcDnX324 >>> Aborting. Failed to activate new LV to wipe the start of it. >>> >>> I tried restarting clvmd for good measure. Still no luck. What am I >>> doing wrong? >>> >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From terrybdavis at gmail.com Wed Oct 1 16:34:20 2008 From: terrybdavis at gmail.com (Terry Davis) Date: Wed, 1 Oct 2008 11:34:20 -0500 Subject: [Linux-cluster] adding volume to cluster In-Reply-To: References: <14139e3a0809291459y6d4beb36sf69feb14c18f4b72@mail.gmail.com> <14139e3a0809300933s5014e4fp2465554c0bd1255a@mail.gmail.com> Message-ID: <14139e3a0810010934j65e31733kff5e45772f13a95b@mail.gmail.com> I started over by deleting and recreating the volume in our SAN: [root at omadvnfs01a ~]# pvcreate /dev/sdh1 Physical volume "/dev/sdh1" successfully created [root at omadvnfs01a ~]# vgcreate -c y vg_data01e /dev/sdh1 Volume group "vg_data01e" successfully created [root at omadvnfs01a ~]# lvcreate -n lv_data01e -l100%VG vg_data01e Error locking on node omadvnfs01a: Volume group for uuid not found: FbdGWogYIYwX1IfeZcSLxoGNoGgfcZMOzeHZMf1beXTguz9JpiJifBi0dKzwG7pI Error locking on node omadvnfs01b: Volume group for uuid not found: FbdGWogYIYwX1IfeZcSLxoGNoGgfcZMOzeHZMf1beXTguz9JpiJifBi0dKzwG7pI Aborting. Failed to activate new LV to wipe the start of it. Yes, I am running the clvmd service. Frustrating. 2008/10/1 POWERBALL ONLINE > Please give me the error when you create the LVM in /var/log/messages. > Are you running clvmd service? > > 2008/9/30 Terry Davis > >> lvm2-2.02.32-4.el5 >> lvm2-cluster-2.02.32-4.el5 >> >> I don't have any patches available for LVM. >> >> 2008/9/30 POWERBALL ONLINE >> >> Hello, >>> >>> Are you already update patch? >>> Because it is teh LVM bug you can find it in redhat bugzila. >>> >>> Best Regards, >>> >>> Somsak >>> >>> 2008/9/30 Terry Davis >>> >>>> Hello, >>>> >>>> I am having a heck of a time getting a volume to show up in my cluster. >>>> I have a feeling I am doing something wrong but this isn't the first one >>>> I've added so I'm not sure where I got lucky before. Here is what I've done >>>> thus far in my 2 node RHEL5 cluster: >>>> >>>> 1) Created my volume in my SAN and gave both nodes access to it >>>> 2) on node A: created 4TB partition with parted and made a gpt label >>>> 3) on node A: pvcreate /dev/sdc1 >>>> 4) on node B: vgcreate vg_data01e /dev/sdc1 >>>> 5) on both nodes: vgchange -a y >>>> 6) on node A: lvcreate -n lv_data01e vg_data01e >>>> >>>> I get the error: >>>> Error locking on node omadvnfs01b: Volume group for uuid not found: >>>> p9SfIjriPtXY33G1Yi3YdojvAAAzmAuwlOLqhVzX8mqL6goiVmUAgQZLGcDnX324 >>>> Error locking on node omadvnfs01a: Volume group for uuid not found: >>>> p9SfIjriPtXY33G1Yi3YdojvAAAzmAuwlOLqhVzX8mqL6goiVmUAgQZLGcDnX324 >>>> Aborting. Failed to activate new LV to wipe the start of it. >>>> >>>> I tried restarting clvmd for good measure. Still no luck. What am I >>>> doing wrong? >>>> >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From agk at redhat.com Wed Oct 1 16:42:47 2008 From: agk at redhat.com (Alasdair G Kergon) Date: Wed, 1 Oct 2008 17:42:47 +0100 Subject: [Linux-cluster] adding volume to cluster In-Reply-To: <14139e3a0810010934j65e31733kff5e45772f13a95b@mail.gmail.com> References: <14139e3a0809291459y6d4beb36sf69feb14c18f4b72@mail.gmail.com> <14139e3a0809300933s5014e4fp2465554c0bd1255a@mail.gmail.com> <14139e3a0810010934j65e31733kff5e45772f13a95b@mail.gmail.com> Message-ID: <20081001164247.GB6173@agk.fab.redhat.com> I hope that problem was fixed in newer packages. Meanwhile try running 'clvmd -R' between some of the commands. If all else fails, you may have to kill the clvmd daemons in the cluster and restart them, or even add a 'vgscan' on each node before the restart. Alasdair -- agk at redhat.com From terrybdavis at gmail.com Wed Oct 1 17:06:15 2008 From: terrybdavis at gmail.com (Terry Davis) Date: Wed, 1 Oct 2008 12:06:15 -0500 Subject: [Linux-cluster] adding volume to cluster In-Reply-To: <20081001164247.GB6173@agk.fab.redhat.com> References: <14139e3a0809291459y6d4beb36sf69feb14c18f4b72@mail.gmail.com> <14139e3a0809300933s5014e4fp2465554c0bd1255a@mail.gmail.com> <14139e3a0810010934j65e31733kff5e45772f13a95b@mail.gmail.com> <20081001164247.GB6173@agk.fab.redhat.com> Message-ID: <14139e3a0810011006o6084a3ddqd7786d150b5b0658@mail.gmail.com> On Wed, Oct 1, 2008 at 11:42 AM, Alasdair G Kergon wrote: > I hope that problem was fixed in newer packages. > > Meanwhile try running 'clvmd -R' between some of the commands. > > If all else fails, you may have to kill the clvmd daemons in the cluster > and restart them, or even add a 'vgscan' on each node before the restart. > > Alasdair > -- > agk at redhat.com Just a sanity check. I killed all the clvmd daemons and started clvmd back up. I created the PV on node A: [root at omadvnfs01a ~]# pvcreate /dev/sdh1 Physical volume "/dev/sdh1" successfully created Node B knows nothing of /dev/sdh1 but it does exist: [root at omadvnfs01b ~]# ls /dev/sdh* /dev/sdh [root at omadvnfs01b ~]# parted /dev/sdh GNU Parted 1.8.1 Using /dev/sdh Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: EQLOGIC 100E-00 (scsi) Disk /dev/sdh: 4398GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 17.4kB 4398GB 4398GB primary Maybe this is why the pvcreate and vgcreate aren't tracking with Node B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Edoardo.Causarano at laitspa.it Wed Oct 1 17:18:18 2008 From: Edoardo.Causarano at laitspa.it (Edoardo Causarano) Date: Wed, 1 Oct 2008 19:18:18 +0200 Subject: [Linux-cluster] "gfs" init script configuration Message-ID: Hi all, further investigation shows that my gfs stalling on reboot is due to incorrect specification of the filesystem in /ec/fstab. I mount is as _netdev so /etc/init.d/gfs won't pick it up. What is the correct syntax to make sure /etc/init.d/gfs will pick up the fs at the right time during shutdown (before tearing down scsi_reservation and clustering)? IE... can I peek at your fstabs? ;) E (excuse me for the outlook mail) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakub.suchy at enlogit.cz Wed Oct 1 19:04:48 2008 From: jakub.suchy at enlogit.cz (Jakub Suchy) Date: Wed, 1 Oct 2008 21:04:48 +0200 Subject: [Linux-cluster] Cisco working configuration Message-ID: <20081001190448.GB10123@aaron> Hello all, because we are still struggling with RHEL5.2 cluster together with Cisco network infrastructure, I'd like to ask if there is somebody which has this configuration working: RHEL5.2 cluster + Cisco 6500/4500/3500 infrastructure, heartbeat going through these switches, nodes are not linked through crossover cable. If so, can you please contact me for further details? I would very much appreciate the help. Thank you, Jakub Suchy From lpleiman at redhat.com Wed Oct 1 19:22:01 2008 From: lpleiman at redhat.com (Leo Pleiman) Date: Wed, 1 Oct 2008 15:22:01 -0400 (EDT) Subject: [Linux-cluster] Cisco working configuration In-Reply-To: <2000762384.122241222888753010.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Message-ID: <620016859.122671222888921460.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> The kbase article can be found at http://kbase.redhat.com/faq/FAQ_51_11755.shtm It has a link to Cisco's web site enumerating 5 possible solutions. http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008059a9df.shtml Leo J Pleiman Senior Consultant GPS, Red Hat Inc. 410.688.3873 ----- Original Message ----- From: "Jakub Suchy" To: linux-cluster at redhat.com Sent: Wednesday, October 1, 2008 3:04:48 PM GMT -05:00 US/Canada Eastern Subject: [Linux-cluster] Cisco working configuration Hello all, because we are still struggling with RHEL5.2 cluster together with Cisco network infrastructure, I'd like to ask if there is somebody which has this configuration working: RHEL5.2 cluster + Cisco 6500/4500/3500 infrastructure, heartbeat going through these switches, nodes are not linked through crossover cable. If so, can you please contact me for further details? I would very much appreciate the help. Thank you, Jakub Suchy -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From jakub.suchy at enlogit.cz Wed Oct 1 21:11:09 2008 From: jakub.suchy at enlogit.cz (Jakub Suchy) Date: Wed, 1 Oct 2008 23:11:09 +0200 Subject: [Linux-cluster] Cisco working configuration In-Reply-To: <620016859.122671222888921460.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> References: <2000762384.122241222888753010.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> <620016859.122671222888921460.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> Message-ID: <20081001211109.GA11341@aaron> Leo Pleiman wrote: > > The kbase article can be found at http://kbase.redhat.com/faq/FAQ_51_11755.shtm > It has a link to Cisco's web site enumerating 5 possible solutions. http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008059a9df.shtml Hello, I am aware of these documents and I have tried all these solutions. Jakub From accdias+cluster at gmail.com Wed Oct 1 22:30:26 2008 From: accdias+cluster at gmail.com (Antonio Dias) Date: Wed, 1 Oct 2008 19:30:26 -0300 Subject: [Linux-cluster] Cisco working configuration In-Reply-To: <20081001190448.GB10123@aaron> References: <20081001190448.GB10123@aaron> Message-ID: <204313690810011530vc5548d9g39ec11661560366c@mail.gmail.com> Hi, Jakub, On Wed, Oct 1, 2008 at 16:04, Jakub Suchy wrote: > RHEL5.2 cluster + Cisco 6500/4500/3500 infrastructure, heartbeat going > through these switches, nodes are not linked through crossover cable. I believe you just need to force all nodes to "speak" IGMPv2. Do this: for iface in /proc/sys/net/ipv4/conf/*; do echo '2' > $iface/force_igmp_version done On each node and probably it will work after. Cisco switches generally speak IGMPv2 and Linux defaults to IGMPv3. RHCS depends on multicast to work properly and multicast depends on IGMP. You will need to set this in /etc/sysctl.conf to make it persistent between reboots. Do this: cat << EOF >> /etc/sysctl.conf net.ipv4.conf.all.force_igmp_version = 2 net.ipv4.conf.default.force_igmp_version = 2 EOF Let us know if this resolve your problem. -- Antonio Dias From terrybdavis at gmail.com Wed Oct 1 23:05:24 2008 From: terrybdavis at gmail.com (Terry Davis) Date: Wed, 1 Oct 2008 18:05:24 -0500 Subject: [Linux-cluster] adding volume to cluster In-Reply-To: <14139e3a0810011006o6084a3ddqd7786d150b5b0658@mail.gmail.com> References: <14139e3a0809291459y6d4beb36sf69feb14c18f4b72@mail.gmail.com> <14139e3a0809300933s5014e4fp2465554c0bd1255a@mail.gmail.com> <14139e3a0810010934j65e31733kff5e45772f13a95b@mail.gmail.com> <20081001164247.GB6173@agk.fab.redhat.com> <14139e3a0810011006o6084a3ddqd7786d150b5b0658@mail.gmail.com> Message-ID: <14139e3a0810011605o3d9ca45bq918ae7b94e6c1444@mail.gmail.com> Awesome. I rebooted and applied all available updates and now it works. Only thing worth noting in the updates was a kernel update to 2.6.18-92.1.13.el5. I think a reboot did it (for some reason). On Wed, Oct 1, 2008 at 12:06 PM, Terry Davis wrote: > On Wed, Oct 1, 2008 at 11:42 AM, Alasdair G Kergon wrote: > >> I hope that problem was fixed in newer packages. >> >> Meanwhile try running 'clvmd -R' between some of the commands. >> >> If all else fails, you may have to kill the clvmd daemons in the cluster >> and restart them, or even add a 'vgscan' on each node before the restart. >> >> Alasdair >> -- >> agk at redhat.com > > > > Just a sanity check. I killed all the clvmd daemons and started clvmd back > up. I created the PV on node A: > [root at omadvnfs01a ~]# pvcreate /dev/sdh1 > Physical volume "/dev/sdh1" successfully created > > Node B knows nothing of /dev/sdh1 but it does exist: > [root at omadvnfs01b ~]# ls /dev/sdh* > /dev/sdh > [root at omadvnfs01b ~]# parted /dev/sdh > GNU Parted 1.8.1 > Using /dev/sdh > Welcome to GNU Parted! Type 'help' to view a list of commands. > (parted) p > > Model: EQLOGIC 100E-00 (scsi) > Disk /dev/sdh: 4398GB > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 17.4kB 4398GB 4398GB primary > > > Maybe this is why the pvcreate and vgcreate aren't tracking with Node B. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From denisb+gmane at gmail.com Thu Oct 2 08:16:08 2008 From: denisb+gmane at gmail.com (denis) Date: Thu, 02 Oct 2008 10:16:08 +0200 Subject: [Linux-cluster] qdisk questions Message-ID: Hi, I have recently had a couple of situations with my cluster where both nodes were restarted simultaneously. The reasons for this are a bit beyond me so I was wondering if anyone could clarify / point me to relevant documentation. Following excerpts from both nodes logs : Oct 2 08:32:22 node1 qdiskd[3758]: Heuristic: 'ping 10.X.X.X -c1 -t2' DOWN (3/3) Oct 2 08:32:39 node1 qdiskd[3758]: Heuristic: 'ping X.X.X.X -c1 -t2' DOWN (6/6) Oct 2 08:32:55 node1 qdiskd[3758]: Heuristic: 'ping X.X.X.X -c1 -t2' DOWN (6/6) Oct 2 08:32:58 node1 qdiskd[3758]: Heuristic: 'ping X.X.X.X -c1 -t1' DOWN (6/6) Oct 2 08:33:01 node1 qdiskd[3758]: Score insufficient for master operation (0/4; required=1); downgrading Oct 2 08:33:01 node1 kernel: md: stopping all md devices. Oct 2 08:32:23 node2 qdiskd[3599]: Heuristic: 'ping 10.X.X.X -c1 -t2' DOWN (3/3) Oct 2 08:32:49 node2 qdiskd[3599]: Heuristic: 'ping X.X.X.X -c1 -t2' DOWN (6/6) Oct 2 08:32:56 node2 qdiskd[3599]: Heuristic: 'ping X.X.X.X -c1 -t1' DOWN (6/6) Oct 2 08:32:56 node2 qdiskd[3599]: Heuristic: 'ping X.X.X.X -c1 -t2' DOWN (6/6) Oct 2 08:33:03 node2 qdiskd[3599]: Score insufficient for master operation (0/4; required=1); downgrading Oct 2 08:33:03 node2 kernel: md: stopping all md devices. Does qdisk reboot the node due to these tests failing? The upstream routers these nodes are connected to were unavailable for at most 2 minutes, and all four pingtests require connectivity through the router (probably need to change that!?). What kind of tests can I use for qdiskd that will prevent router-outages from killing my cluster completely? Regards -- Denis From xavier.montagutelli at unilim.fr Thu Oct 2 09:43:58 2008 From: xavier.montagutelli at unilim.fr (Xavier Montagutelli) Date: Thu, 2 Oct 2008 11:43:58 +0200 Subject: [Linux-cluster] CLVM clarification In-Reply-To: <777f2ade0810010839y77afc32bn5038bcfb4e3c1a6d@mail.gmail.com> References: <777f2ade0810010839y77afc32bn5038bcfb4e3c1a6d@mail.gmail.com> Message-ID: <200810021143.58402.xavier.montagutelli@unilim.fr> On Wednesday 01 October 2008 17:39, Angelo Compagnucci wrote: > Hi to all,This is my first post on this list. Thanks in advance for every > answer. > > I've already read every guide in this matter, this is the list: > > Cluster_Administration.pdf > Cluster_Logical_Volume_Manager.pdf > Global_Network_Block_Device.pdf > Cluster_Suite_Overview.pdf > Global_File_System.pdf > CLVM.pdf > RedHatClusterAdminOverview.pdf > > The truth is that I've not clear a point about CLVM. > > Let's me make an example: > > In this example CLVM and the Cluster suite are fully running without > problems. Let's pose the same configuration of cluster.conf and lvm.conf > and the nodes of the cluster are joined and operatives. Does your example include a shared storage (GNBD, iSCSI, SAN, ...) ? > > NODE1: > > pvcreate /dev/hda3 > > NODE2: > > pvcreate /dev/hda2 > > Let's pose that CLVM spans LVM metadata across the cluster, if I stroke the > command: > > pvscan > > I should see /dev/sda2 and /dev/sda3 > > and then I can create a vg with > > vgcreate /dev/sda2 /dev/sda3 ... > > The question is: How LVM metadata sharing works? I have to use GNBD on the > row partion to share a device between nodes? I can create a GFS over a > spanned volume group? Are shareable only logical volumes? I have the feeling that something is not clear here. I am not an expert, but : GNBD is just a mean to export a block device on the IP network. A GNBD device is accessible to multiple nodes at the same time, and thus you can include that block device in a CLVM Volume Group. Instead of GNBD, you can also use any other shared storage (iSCSI, FC, ...). Be careful, from what I have understood, some SAN storage are not sharable between many hosts (NBD, AoE for example) ! After that, you have the choice : - to make one LV with a shared filesystem (GFS). You can then mount the same filesystem on many nodes at the same time. - to make many LV with an ext3 / xfs / ... filesystem. But you then have to make sure that one LV is mounted on only one node at a given time. But the type of filesystem is independant, this is a higher component. In this picture, CLVM is only a low-level component, avoiding the concurrent access of many nodes on the LVM metadata written on the shared storage. The data are not "spanned" across the local storage of many nodes (well, I suppose you *could* do that, but you would need other tools / layers ?) Other point : if I remember correctly, the Red Hat doc says it's not recommended to use GFS on a node that exports a GNBD device. So if you use GNBD as a shared storage, I suppose it's better to specialize one or more nodes as GNBD "servers". HTH > > Thanks for your answers!! -- Xavier Montagutelli Tel : +33 (0)5 55 45 77 20 Service Commun Informatique Fax : +33 (0)5 55 45 75 95 Universite de Limoges 123, avenue Albert Thomas 87060 Limoges cedex From angelo.compagnucci at gmail.com Thu Oct 2 10:28:29 2008 From: angelo.compagnucci at gmail.com (Angelo Compagnucci) Date: Thu, 2 Oct 2008 12:28:29 +0200 Subject: [Linux-cluster] CLVM clarification In-Reply-To: <200810021143.58402.xavier.montagutelli@unilim.fr> References: <777f2ade0810010839y77afc32bn5038bcfb4e3c1a6d@mail.gmail.com> <200810021143.58402.xavier.montagutelli@unilim.fr> Message-ID: <777f2ade0810020328h75710a54u36c42e4e48a01429@mail.gmail.com> Ok, this could be clear, but in the Cluster_Logical_Volume_Manager.pdf I've read (bottom of page 3): "The clmvd daemon is the key clustering extension to LVM. The clvmd daemon runs in each cluster computer and distributes LVM metadata updates in a cluster, presenting each cluster computer with the same view of the logical volumes" This is a picture of wath I have in mind: ----------------------------------------- | GFS filesystem | ----------------------------------------- | LV | ----------------------------------------- | VG | ----------------------------------------- | PV1 | PV2 | PV3 | ----------------------------------------- | GNBD1 | GNBD2 | GNBD3 | ----------------------------------------- | hda1 | hda1 | hda1 | | Node1 | Node2 | Node3 | ----------------------------------------- In this case the clvm features are not useful because there is only one machine (that could not be a node of a cluster) that have the lvm over GNBD exported devices. So the nodes doesn't know nothing about the other nodes. Let's pose this situation: ----------------------------------------------- | GFS | ----------------------------------------------- | LV | ----------------------------------------------- | VG1 | VG2 | ----------------------------------------------- | PV1 | PV2 | | Node1 | Node2 | ----------------------------------------------- | CLVM coordinates | ----------------------------------------------- In this situatuation makes sense to have a clustered lvm because if I have to make some maintenance over VGs, CLVM can lock and unlock the interested device. Is this the correct behaviour?? In the contrary, which is the CLVM role in a cluster? 2008/10/2 Xavier Montagutelli > On Wednesday 01 October 2008 17:39, Angelo Compagnucci wrote: > > Hi to all,This is my first post on this list. Thanks in advance for every > > answer. > > > > I've already read every guide in this matter, this is the list: > > > > Cluster_Administration.pdf > > Cluster_Logical_Volume_Manager.pdf > > Global_Network_Block_Device.pdf > > Cluster_Suite_Overview.pdf > > Global_File_System.pdf > > CLVM.pdf > > RedHatClusterAdminOverview.pdf > > > > The truth is that I've not clear a point about CLVM. > > > > Let's me make an example: > > > > In this example CLVM and the Cluster suite are fully running without > > problems. Let's pose the same configuration of cluster.conf and lvm.conf > > and the nodes of the cluster are joined and operatives. > > Does your example include a shared storage (GNBD, iSCSI, SAN, ...) ? > > > > > NODE1: > > > > pvcreate /dev/hda3 > > > > NODE2: > > > > pvcreate /dev/hda2 > > > > Let's pose that CLVM spans LVM metadata across the cluster, if I stroke > the > > command: > > > > pvscan > > > > I should see /dev/sda2 and /dev/sda3 > > > > and then I can create a vg with > > > > vgcreate /dev/sda2 /dev/sda3 ... > > > > The question is: How LVM metadata sharing works? I have to use GNBD on > the > > row partion to share a device between nodes? I can create a GFS over a > > spanned volume group? Are shareable only logical volumes? > > I have the feeling that something is not clear here. I am not an expert, > but : > > GNBD is just a mean to export a block device on the IP network. A GNBD > device > is accessible to multiple nodes at the same time, and thus you can include > that block device in a CLVM Volume Group. Instead of GNBD, you can also use > any other shared storage (iSCSI, FC, ...). Be careful, from what I have > understood, some SAN storage are not sharable between many hosts (NBD, AoE > for example) ! > > After that, you have the choice : > > - to make one LV with a shared filesystem (GFS). You can then mount the > same > filesystem on many nodes at the same time. > > - to make many LV with an ext3 / xfs / ... filesystem. But you then have > to > make sure that one LV is mounted on only one node at a given time. > > But the type of filesystem is independant, this is a higher component. > > In this picture, CLVM is only a low-level component, avoiding the > concurrent > access of many nodes on the LVM metadata written on the shared storage. > > The data are not "spanned" across the local storage of many nodes (well, I > suppose you *could* do that, but you would need other tools / layers ?) > > Other point : if I remember correctly, the Red Hat doc says it's not > recommended to use GFS on a node that exports a GNBD device. So if you use > GNBD as a shared storage, I suppose it's better to specialize one or more > nodes as GNBD "servers". > > > HTH > > > > > Thanks for your answers!! > > -- > Xavier Montagutelli Tel : +33 (0)5 55 45 77 20 > Service Commun Informatique Fax : +33 (0)5 55 45 75 95 > Universite de Limoges > 123, avenue Albert Thomas > 87060 Limoges cedex > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.montagutelli at unilim.fr Thu Oct 2 12:56:29 2008 From: xavier.montagutelli at unilim.fr (Xavier Montagutelli) Date: Thu, 2 Oct 2008 14:56:29 +0200 Subject: [Linux-cluster] CLVM clarification In-Reply-To: <777f2ade0810020328h75710a54u36c42e4e48a01429@mail.gmail.com> References: <777f2ade0810010839y77afc32bn5038bcfb4e3c1a6d@mail.gmail.com> <200810021143.58402.xavier.montagutelli@unilim.fr> <777f2ade0810020328h75710a54u36c42e4e48a01429@mail.gmail.com> Message-ID: <200810021456.29214.xavier.montagutelli@unilim.fr> On Thursday 02 October 2008 12:28, Angelo Compagnucci wrote: > Ok, this could be clear, but in the Cluster_Logical_Volume_Manager.pdf I've > read (bottom of page 3): > "The clmvd daemon is the key clustering extension to LVM. The clvmd daemon > runs in each cluster computer and distributes LVM metadata updates in a > cluster, presenting each cluster computer with the same view of the logical > volumes" > > This is a picture of wath I have in mind: This picture doesn't show the difference between a GNBD server (which doesn't know anything about the use of the exported block device : it doesn't know the VG for example) and the GNBD clients (which actually use the block device as PV). May I add some layers ? Not exactly what I have in mind but I am not a ascii art expert : --------------------------- | GFS filesystem | --------------------------- | LV | --------------------------- | VG | --------------------------- | PV1 | PV2 | PV3 | .---------.--------.--------. | CLVM | .---------.--------.--------. | cluster basis (dlm,...) | .---------.--------.--------. | Node4 | Node5 | Node6 | .---------.--------.--------. (Node4,5,6 have access to the three GNBD devices) \ | | / \___|_|___/ / | | \ / | | \ / | | \ .---------.--------.---------. | GNBD1 | GNBD2 | GNBD3 | .---------.--------.---------. | hda1 | hda1 | hda1 | | Node1 | Node2 | Node3 | .---------.--------.---------. > > In this case the clvm features are not useful because there is only one > machine (that could not be a node of a cluster) that have the lvm over GNBD > exported devices. So the nodes doesn't know nothing about the other nodes. If your GNBD* devices are accessed by only one other node. But if the GNBD are served to multiple nodes (nodes4,5,6), then CLVM is useful. > > Let's pose this situation: > > ----------------------------------------------- > | GFS | > ----------------------------------------------- > | LV | > ----------------------------------------------- > | VG1 | VG2 | > ----------------------------------------------- > | PV1 | PV2 | > | Node1 | Node2 | > ----------------------------------------------- > | CLVM coordinates | > ----------------------------------------------- > > In this situatuation makes sense to have a clustered lvm because if I have > to make some maintenance over VGs, CLVM can lock and unlock the interested > device. > > Is this the correct behaviour?? Perhaps I miss your point, but it doesn't make sense if the block devices are local to each node. How could Node2 have access to the block device on Node1 (showed as PV1) ? CLVM is useful only when you have a shared storage. > In the contrary, which is the CLVM role in a cluster? >From what I know, CLVM protects the metadata parts of LVM on the shared storage. And when you make one operation the shared storage on one node (for example, create a new LV), all the nodes are aware of the change. > > > 2008/10/2 Xavier Montagutelli > > > On Wednesday 01 October 2008 17:39, Angelo Compagnucci wrote: > > > Hi to all,This is my first post on this list. Thanks in advance for > > > every answer. > > > > > > I've already read every guide in this matter, this is the list: > > > > > > Cluster_Administration.pdf > > > Cluster_Logical_Volume_Manager.pdf > > > Global_Network_Block_Device.pdf > > > Cluster_Suite_Overview.pdf > > > Global_File_System.pdf > > > CLVM.pdf > > > RedHatClusterAdminOverview.pdf > > > > > > The truth is that I've not clear a point about CLVM. > > > > > > Let's me make an example: > > > > > > In this example CLVM and the Cluster suite are fully running without > > > problems. Let's pose the same configuration of cluster.conf and > > > lvm.conf and the nodes of the cluster are joined and operatives. > > > > Does your example include a shared storage (GNBD, iSCSI, SAN, ...) ? > > > > > NODE1: > > > > > > pvcreate /dev/hda3 > > > > > > NODE2: > > > > > > pvcreate /dev/hda2 > > > > > > Let's pose that CLVM spans LVM metadata across the cluster, if I stroke > > > > the > > > > > command: > > > > > > pvscan > > > > > > I should see /dev/sda2 and /dev/sda3 > > > > > > and then I can create a vg with > > > > > > vgcreate /dev/sda2 /dev/sda3 ... > > > > > > The question is: How LVM metadata sharing works? I have to use GNBD on > > > > the > > > > > row partion to share a device between nodes? I can create a GFS over a > > > spanned volume group? Are shareable only logical volumes? > > > > I have the feeling that something is not clear here. I am not an expert, > > but : > > > > GNBD is just a mean to export a block device on the IP network. A GNBD > > device > > is accessible to multiple nodes at the same time, and thus you can > > include that block device in a CLVM Volume Group. Instead of GNBD, you > > can also use any other shared storage (iSCSI, FC, ...). Be careful, from > > what I have understood, some SAN storage are not sharable between many > > hosts (NBD, AoE for example) ! > > > > After that, you have the choice : > > > > - to make one LV with a shared filesystem (GFS). You can then mount the > > same > > filesystem on many nodes at the same time. > > > > - to make many LV with an ext3 / xfs / ... filesystem. But you then have > > to > > make sure that one LV is mounted on only one node at a given time. > > > > But the type of filesystem is independant, this is a higher component. > > > > In this picture, CLVM is only a low-level component, avoiding the > > concurrent > > access of many nodes on the LVM metadata written on the shared storage. > > > > The data are not "spanned" across the local storage of many nodes (well, > > I suppose you *could* do that, but you would need other tools / layers ?) > > > > Other point : if I remember correctly, the Red Hat doc says it's not > > recommended to use GFS on a node that exports a GNBD device. So if you > > use GNBD as a shared storage, I suppose it's better to specialize one or > > more nodes as GNBD "servers". > > > > > > HTH > > > > > Thanks for your answers!! > > > > -- > > Xavier Montagutelli Tel : +33 (0)5 55 45 77 20 > > Service Commun Informatique Fax : +33 (0)5 55 45 75 95 > > Universite de Limoges > > 123, avenue Albert Thomas > > 87060 Limoges cedex > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster -- Xavier Montagutelli Tel : +33 (0)5 55 45 77 20 Service Commun Informatique Fax : +33 (0)5 55 45 75 95 Universite de Limoges 123, avenue Albert Thomas 87060 Limoges cedex From angelo.compagnucci at gmail.com Thu Oct 2 15:15:41 2008 From: angelo.compagnucci at gmail.com (Angelo Compagnucci) Date: Thu, 2 Oct 2008 17:15:41 +0200 Subject: [Linux-cluster] CLVM clarification In-Reply-To: <200810021456.29214.xavier.montagutelli@unilim.fr> References: <777f2ade0810010839y77afc32bn5038bcfb4e3c1a6d@mail.gmail.com> <200810021143.58402.xavier.montagutelli@unilim.fr> <777f2ade0810020328h75710a54u36c42e4e48a01429@mail.gmail.com> <200810021456.29214.xavier.montagutelli@unilim.fr> Message-ID: <777f2ade0810020815p26fb6a73u6dcb80a7e7ab81b5@mail.gmail.com> Sorry, but I have not clear the clvm role. CLVM shares VG metadata over a cluster and makes possible a wide cluster administration (RedHat documentation says). In this way a CLVM Cluster must have a CMAN Cluster up and running. So, if I have already a shared storage, the only thing I can do is to make a GFS filesystem and export this one to clients machine. In this way the shared storage could be accessed by multiples machines. In this scenario clvm is not useful because the shared lock on filesystem is guaranted by GFS. Let's pose I have different machines that I want to join in a cluster. Each machine has a storage that I want share with the other machines to create a large storage. With CLVM, stands to RedHat guide, I can create a cluster that "presenting each cluster computer with the same view of the logical volumes".[1] So I have: node 1: VG1 (local) VG2 (node2) VG3 (node3) node 2: VG1 (node1) VG2 (local) VG3 (node3) node 3: VG1 (node1) VG2 (node2) VG3 (local) This should be what the RedHat CLVM guide stands for "the same view of the logical volumes" >From this point, Node1 is the shared storage. In this example it is visible from alle the cluster's nodes. So if I stroke an "lvcreate", I have to see the newly create LV on the other nodes of the cluster. It is true? If this is true, gndb is not necessary and the layout becomes really simple. Thanks for your time! [1] http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Logical_Volume_Manager/LVM_Cluster_Overview.html 2008/10/2 Xavier Montagutelli > On Thursday 02 October 2008 12:28, Angelo Compagnucci wrote: > > Ok, this could be clear, but in the Cluster_Logical_Volume_Manager.pdf > I've > > read (bottom of page 3): > > "The clmvd daemon is the key clustering extension to LVM. The clvmd > daemon > > runs in each cluster computer and distributes LVM metadata updates in a > > cluster, presenting each cluster computer with the same view of the > logical > > volumes" > > > > This is a picture of wath I have in mind: > > This picture doesn't show the difference between a GNBD server (which > doesn't > know anything about the use of the exported block device : it doesn't know > the VG for example) and the GNBD clients (which actually use the block > device > as PV). May I add some layers ? Not exactly what I have in mind but I am > not > a ascii art expert : > > --------------------------- > | GFS filesystem | > --------------------------- > | LV | > --------------------------- > | VG | > --------------------------- > | PV1 | PV2 | PV3 | > .---------.--------.--------. > | CLVM | > .---------.--------.--------. > | cluster basis (dlm,...) | > .---------.--------.--------. > | Node4 | Node5 | Node6 | > .---------.--------.--------. > (Node4,5,6 have access to the three GNBD devices) > \ | | / > \___|_|___/ > / | | \ > / | | \ > / | | \ > .---------.--------.---------. > | GNBD1 | GNBD2 | GNBD3 | > .---------.--------.---------. > | hda1 | hda1 | hda1 | > | Node1 | Node2 | Node3 | > .---------.--------.---------. > > > > > In this case the clvm features are not useful because there is only one > > machine (that could not be a node of a cluster) that have the lvm over > GNBD > > exported devices. So the nodes doesn't know nothing about the other > nodes. > > If your GNBD* devices are accessed by only one other node. But if the GNBD > are > served to multiple nodes (nodes4,5,6), then CLVM is useful. > > > > > Let's pose this situation: > > > > ----------------------------------------------- > > | GFS | > > ----------------------------------------------- > > | LV | > > ----------------------------------------------- > > | VG1 | VG2 | > > ----------------------------------------------- > > | PV1 | PV2 | > > | Node1 | Node2 | > > ----------------------------------------------- > > | CLVM coordinates | > > ----------------------------------------------- > > > > In this situatuation makes sense to have a clustered lvm because if I > have > > to make some maintenance over VGs, CLVM can lock and unlock the > interested > > device. > > > > Is this the correct behaviour?? > > Perhaps I miss your point, but it doesn't make sense if the block devices > are > local to each node. How could Node2 have access to the block device on > Node1 > (showed as PV1) ? > > CLVM is useful only when you have a shared storage. > > > In the contrary, which is the CLVM role in a cluster? > > >From what I know, CLVM protects the metadata parts of LVM on the shared > storage. And when you make one operation the shared storage on one node > (for > example, create a new LV), all the nodes are aware of the change. > > > > > > > > 2008/10/2 Xavier Montagutelli > > > > > On Wednesday 01 October 2008 17:39, Angelo Compagnucci wrote: > > > > Hi to all,This is my first post on this list. Thanks in advance for > > > > every answer. > > > > > > > > I've already read every guide in this matter, this is the list: > > > > > > > > Cluster_Administration.pdf > > > > Cluster_Logical_Volume_Manager.pdf > > > > Global_Network_Block_Device.pdf > > > > Cluster_Suite_Overview.pdf > > > > Global_File_System.pdf > > > > CLVM.pdf > > > > RedHatClusterAdminOverview.pdf > > > > > > > > The truth is that I've not clear a point about CLVM. > > > > > > > > Let's me make an example: > > > > > > > > In this example CLVM and the Cluster suite are fully running without > > > > problems. Let's pose the same configuration of cluster.conf and > > > > lvm.conf and the nodes of the cluster are joined and operatives. > > > > > > Does your example include a shared storage (GNBD, iSCSI, SAN, ...) ? > > > > > > > NODE1: > > > > > > > > pvcreate /dev/hda3 > > > > > > > > NODE2: > > > > > > > > pvcreate /dev/hda2 > > > > > > > > Let's pose that CLVM spans LVM metadata across the cluster, if I > stroke > > > > > > the > > > > > > > command: > > > > > > > > pvscan > > > > > > > > I should see /dev/sda2 and /dev/sda3 > > > > > > > > and then I can create a vg with > > > > > > > > vgcreate /dev/sda2 /dev/sda3 ... > > > > > > > > The question is: How LVM metadata sharing works? I have to use GNBD > on > > > > > > the > > > > > > > row partion to share a device between nodes? I can create a GFS over > a > > > > spanned volume group? Are shareable only logical volumes? > > > > > > I have the feeling that something is not clear here. I am not an > expert, > > > but : > > > > > > GNBD is just a mean to export a block device on the IP network. A GNBD > > > device > > > is accessible to multiple nodes at the same time, and thus you can > > > include that block device in a CLVM Volume Group. Instead of GNBD, you > > > can also use any other shared storage (iSCSI, FC, ...). Be careful, > from > > > what I have understood, some SAN storage are not sharable between many > > > hosts (NBD, AoE for example) ! > > > > > > After that, you have the choice : > > > > > > - to make one LV with a shared filesystem (GFS). You can then mount > the > > > same > > > filesystem on many nodes at the same time. > > > > > > - to make many LV with an ext3 / xfs / ... filesystem. But you then > have > > > to > > > make sure that one LV is mounted on only one node at a given time. > > > > > > But the type of filesystem is independant, this is a higher component. > > > > > > In this picture, CLVM is only a low-level component, avoiding the > > > concurrent > > > access of many nodes on the LVM metadata written on the shared storage. > > > > > > The data are not "spanned" across the local storage of many nodes > (well, > > > I suppose you *could* do that, but you would need other tools / layers > ?) > > > > > > Other point : if I remember correctly, the Red Hat doc says it's not > > > recommended to use GFS on a node that exports a GNBD device. So if you > > > use GNBD as a shared storage, I suppose it's better to specialize one > or > > > more nodes as GNBD "servers". > > > > > > > > > HTH > > > > > > > Thanks for your answers!! > > > > > > -- > > > Xavier Montagutelli Tel : +33 (0)5 55 45 77 20 > > > Service Commun Informatique Fax : +33 (0)5 55 45 75 95 > > > Universite de Limoges > > > 123, avenue Albert Thomas > > > 87060 Limoges cedex > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Xavier Montagutelli Tel : +33 (0)5 55 45 77 20 > Service Commun Informatique Fax : +33 (0)5 55 45 75 95 > Universite de Limoges > 123, avenue Albert Thomas > 87060 Limoges cedex > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jruemker at redhat.com Thu Oct 2 17:24:25 2008 From: jruemker at redhat.com (John Ruemker) Date: Thu, 02 Oct 2008 13:24:25 -0400 Subject: [Linux-cluster] adding volume to cluster In-Reply-To: <14139e3a0810011605o3d9ca45bq918ae7b94e6c1444@mail.gmail.com> References: <14139e3a0809291459y6d4beb36sf69feb14c18f4b72@mail.gmail.com> <14139e3a0809300933s5014e4fp2465554c0bd1255a@mail.gmail.com> <14139e3a0810010934j65e31733kff5e45772f13a95b@mail.gmail.com> <20081001164247.GB6173@agk.fab.redhat.com> <14139e3a0810011006o6084a3ddqd7786d150b5b0658@mail.gmail.com> <14139e3a0810011605o3d9ca45bq918ae7b94e6c1444@mail.gmail.com> Message-ID: <48E503C9.1030003@redhat.com> Terry Davis wrote: > Awesome. I rebooted and applied all available updates and now it > works. Only thing worth noting in the updates was a kernel update to > 2.6.18-92.1.13.el5. I think a reboot did it (for some reason). > > On Wed, Oct 1, 2008 at 12:06 PM, Terry Davis > wrote: > > On Wed, Oct 1, 2008 at 11:42 AM, Alasdair G Kergon > wrote: > > I hope that problem was fixed in newer packages. > > Meanwhile try running 'clvmd -R' between some of the commands. > > If all else fails, you may have to kill the clvmd daemons in > the cluster > and restart them, or even add a 'vgscan' on each node before > the restart. > > Alasdair > -- > agk at redhat.com > > > > Just a sanity check. I killed all the clvmd daemons and started > clvmd back up. I created the PV on node A: > > [root at omadvnfs01a ~]# pvcreate /dev/sdh1 > Physical volume "/dev/sdh1" successfully created > > Node B knows nothing of /dev/sdh1 but it does exist: > [root at omadvnfs01b ~]# ls /dev/sdh* > /dev/sdh > This is the problem. If you partition the device on one node, you must do a 'partprobe' on all nodes so that they update their partition tables. Without doing this LVM has no idea what /dev/sdh1 is and therefore cannot lock on it. After running partprobe do 'clvmd -R' so that clvmd reloads its device cache and knows which devices are available. After that you can proceed with pvcreate, vgcreate, lvcreate, etc. John From terrybdavis at gmail.com Thu Oct 2 18:16:26 2008 From: terrybdavis at gmail.com (Terry Davis) Date: Thu, 2 Oct 2008 13:16:26 -0500 Subject: [Linux-cluster] adding volume to cluster In-Reply-To: <48E503C9.1030003@redhat.com> References: <14139e3a0809291459y6d4beb36sf69feb14c18f4b72@mail.gmail.com> <14139e3a0809300933s5014e4fp2465554c0bd1255a@mail.gmail.com> <14139e3a0810010934j65e31733kff5e45772f13a95b@mail.gmail.com> <20081001164247.GB6173@agk.fab.redhat.com> <14139e3a0810011006o6084a3ddqd7786d150b5b0658@mail.gmail.com> <14139e3a0810011605o3d9ca45bq918ae7b94e6c1444@mail.gmail.com> <48E503C9.1030003@redhat.com> Message-ID: <14139e3a0810021116n3f63c7f1v3b45702c608484b4@mail.gmail.com> On Thu, Oct 2, 2008 at 12:24 PM, John Ruemker wrote: > Terry Davis wrote: > >> Awesome. I rebooted and applied all available updates and now it works. >> Only thing worth noting in the updates was a kernel update to >> 2.6.18-92.1.13.el5. I think a reboot did it (for some reason). >> >> On Wed, Oct 1, 2008 at 12:06 PM, Terry Davis > terrybdavis at gmail.com>> wrote: >> >> On Wed, Oct 1, 2008 at 11:42 AM, Alasdair G Kergon > > wrote: >> >> I hope that problem was fixed in newer packages. >> >> Meanwhile try running 'clvmd -R' between some of the commands. >> >> If all else fails, you may have to kill the clvmd daemons in >> the cluster >> and restart them, or even add a 'vgscan' on each node before >> the restart. >> >> Alasdair >> -- >> agk at redhat.com >> >> >> >> Just a sanity check. I killed all the clvmd daemons and started >> clvmd back up. I created the PV on node A: >> >> [root at omadvnfs01a ~]# pvcreate /dev/sdh1 >> Physical volume "/dev/sdh1" successfully created >> >> Node B knows nothing of /dev/sdh1 but it does exist: >> [root at omadvnfs01b ~]# ls /dev/sdh* >> /dev/sdh >> >> > This is the problem. If you partition the device on one node, you must do > a 'partprobe' on all nodes so that they update their partition tables. > Without doing this LVM has no idea what /dev/sdh1 is and therefore cannot > lock on it. After running partprobe do 'clvmd -R' so that clvmd reloads its > device cache and knows which devices are available. After that you can > proceed with pvcreate, vgcreate, lvcreate, etc. > John Ahhhh, the step that I was missing all along. I have gone ahead and carved that into the back of my hand with a dull pencil so I don't forget next time. Thanks for the help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From macscr at macscr.com Fri Oct 3 01:39:59 2008 From: macscr at macscr.com (Mark Chaney) Date: Thu, 2 Oct 2008 20:39:59 -0500 Subject: [Linux-cluster] error messages explained Message-ID: <02d501c924f8$f3eb4020$dbc1c060$@com> Cam someone explain to me these errors and tell me how I should attempt to resolve them? They both aren't happening at the same time exactly, its just to errors that I don't truly understand. #################### ccsd[3192]: Attempt to close an unopened CCS descriptor (13590). ccsd[3192]: Error while processing disconnect: Invalid request descriptor ################## openais[5453]: [MAIN ] Killing node ratchet.local because it has rejoined the cluster with existing state ################## Thanks, Mark From macscr at macscr.com Fri Oct 3 01:56:33 2008 From: macscr at macscr.com (Mark Chaney) Date: Thu, 2 Oct 2008 20:56:33 -0500 Subject: [Linux-cluster] fence daemon settings Message-ID: <02dd01c924fb$44a2a1f0$cde7e5d0$@com> I have a 3 node cluster running gfs. They have a private gigabit network that's used for cluster communication and backups. Can you recommend what type of settings I should have here? My nodes don't seem to rejoin the cluster to well after being fenced. Thanks, Mark From ccaulfie at redhat.com Fri Oct 3 07:38:32 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 03 Oct 2008 08:38:32 +0100 Subject: [Linux-cluster] error messages explained In-Reply-To: <02d501c924f8$f3eb4020$dbc1c060$@com> References: <02d501c924f8$f3eb4020$dbc1c060$@com> Message-ID: <48E5CBF8.4000301@redhat.com> Mark Chaney wrote: > Cam someone explain to me these errors and tell me how I should attempt to > resolve them? They both aren't happening at the same time exactly, its just > to errors that I don't truly understand. > > #################### > > ccsd[3192]: Attempt to close an unopened CCS descriptor (13590). > ccsd[3192]: Error while processing disconnect: > Invalid request descriptor > > ################## > > openais[5453]: [MAIN ] Killing node ratchet.local because it has rejoined > the cluster with existing state > I need to add this to the FAQ! What this message means is that a node was a valid member of the cluster once; it then left the cluster (without being fenced) and rejoined automatically. This can sometimes happen if the ethernet is disconnected for a time, usually a few seconds. If a node leave the cluster, it MUST rejoin using the cman_tool join command with no services running. The usual way to make this happen is to reboot the node, and if fencing is configured correctly that is what normally happens. It could be that fencing is too slow to manage this or that the cluster is made up of two nodes without a quorum disk so that the 'other' node doesn't have quorum and cannot initiate fencing. Another (more common) cause of this, is slow responding of some Cisco switches as documented here: http://www.openais.org/doku.php?id=faq:cisco_switches -- Chrissie From d.vasilets at peterhost.ru Fri Oct 3 13:30:38 2008 From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=) Date: Fri, 03 Oct 2008 17:30:38 +0400 Subject: [Linux-cluster] can't compile cluster-2.03.07 Message-ID: <1223040639.10584.1.camel@dima-desktop> I try compile cluster-2.03.07 with kernel 2.6.26-5 how i can fix that ? every time report make[2]: Entering directory `/usr/src/kernels/linux-2.6.26.5' WARNING: Symbol version dump /usr/src/kernels/linux-2.6.26.5/Module.symvers is missing; modules will have no dependencies and modversions. Building modules, stage 2. MODPOST 1 modules /bin/sh: scripts/mod/modpost: No such file or directory make[3]: *** [__modpost] Error 127 make[2]: *** [modules] Error 2 make[2]: Leaving directory `/usr/src/kernels/linux-2.6.26.5' make[1]: *** [gnbd.ko] Error 2 make[1]: Leaving directory `/root/gfs/cluster/gnbd-kernel/src' make: *** [gnbd-kernel/src] Error 2 From tuckerd at engr.smu.edu Fri Oct 3 15:54:39 2008 From: tuckerd at engr.smu.edu (Doug Tucker) Date: Fri, 03 Oct 2008 10:54:39 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues Message-ID: <1223049279.22918.30.camel@thor.seas.smu.edu> We recently migrated from a 7 year old file server running on a single proc dec alpha running Tru64 and utilizing Truclustering for HA, to a Redhat cluster suite and gfs for HA on a dual duo core dell 2950 with 32gb ram, and have been having major performance issues. Both have fiber attached storage. The old file server grossly outperforms the new one! The way we are utilizing it is for nfs file serving only to multiple clients. It doesn't take many users doing much on the clients, to easily drive the load on the boxes into the 10+ range, where on the old file server it never got above 2 or 3 to perform the same tasks. The load and performance was much worse, but improved to where we are now after setting all of the volumes to statfs_fast 1. I also set nfs threads to 256, which helped some, but I don't know what more to do, and we are at the point of abandoning this platform if we cannot get it to perform reasonably. Please help! Sincerely, Doug Tucker Network and Systems Southern Methodist University From gordan at bobich.net Fri Oct 3 16:25:30 2008 From: gordan at bobich.net (Gordan Bobic) Date: Fri, 3 Oct 2008 17:25:30 +0100 Subject: [Linux-cluster] rhcs + gfs performance issues Message-ID: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) It sounds like you have a SAN (fibre attached storage) that you are trying to turn into a NAS. That's justifiable if you have multiple mirrored SANs, but makes a mockery of HA if you only have one storage device since it leaves you with a single point of failure regardless of the number of front end nodes. Do you have a separate gigabit interface/vlan just for cluster communication? RHCS doesn't use a lot of sustained bandwidth but performance is sensitive to latencies for DLM comms. If you only have 2 nodes, a direct crossover connection would be ideal. How big is your data store? Are files large or small? Are they in few directories with lots of files (e.g. Maildir)? Load averages will go up - that's normal, since there is added latency (round trip time) from locking across nodes. Unless your CPUs is 0% idle, the servers aren't running out of steam. So don't worry about it. Also note that a clustered FS will _ALWAYS_ be slower than a non-clustered one, all things being equal. No exceptions. Also, if you are load sharing across the nodes, and you have Maildir-like file structures, it'll go slower than a purely fail-over setup, even on a clustered FS (like GFS), since there is no lock bouncing between head nodes. For extra performance, you can use a non-clustered FS as a failover resource, but be very careful with that since dual mounting a non-clustered FS will destroy the volume firtually instantly. Provided that your data isn't fundamentally unsuitable for being handled by a clustered load sharing setup, you could try increasing lock trimming and increasing the number of resource groups. Search through the archives for details on that. More suggestions when you provide more details on what your data is like. Gordan -----Original Message----- From: "Doug Tucker" To: linux-cluster at redhat.com Sent: 03/10/08 16:54 Subject: [Linux-cluster] rhcs + gfs performance issues We recently migrated from a 7 year old file server running on a single proc dec alpha running Tru64 and utilizing Truclustering for HA, to a Redhat cluster suite and gfs for HA on a dual duo core dell 2950 with 32gb ram, and have been having major performance issues. Both have fiber attached storage. The old file server grossly outperforms the new one! The way we are utilizing it is for nfs file serving only to multiple clients. It doesn't take many users doing much on the clients, to easily drive the load on the boxes into the 10+ range, where on the old file server it never got above 2 or 3 to perform the same tasks. The load and performance was much worse, but improved to where we are now after setting all of the volumes to statfs_fast 1. I also set nfs threads to 256, which helped some, but I don't know what more to do, and we are at the point of abandoning this platform if we cannot get it to perform reasonably. Please help! Sincerely, Doug Tucker Network and Systems Southern Methodist University -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From tuckerd at engr.smu.edu Fri Oct 3 16:56:51 2008 From: tuckerd at engr.smu.edu (Doug Tucker) Date: Fri, 03 Oct 2008 11:56:51 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) Message-ID: <1223053011.22918.62.camel@thor.seas.smu.edu> Thanks so much for the reply, hopefully this will lead to something. On Fri, 2008-10-03 at 17:25 +0100, Gordan Bobic wrote: > It sounds like you have a SAN (fibre attached storage) that you are trying to turn into a NAS. That's justifiable if you have multiple mirrored SANs, but makes a mockery of HA if you only have one storage device since it leaves you with a single point of failure regardless of the number of front end nodes. Understood on the san single point of failure. We're addressing HA on the front end, don't have the money to address the backend yet. Storage is something you set up once and don't have to mess with again, and doesn't do things like have application issues, etc, it's just storage. So barring a hardware issue not covered by redundant power supplies, spare disks, etc, it doesn't have issues. Having a cluster on the front end allows for failure of software on one, being able to reboot one, and provide zero downtime to the clients. > Do you have a separate gigabit interface/vlan just for cluster communication? RHCS doesn't use a lot of sustained bandwidth but performance is sensitive to latencies for DLM comms. If you only have 2 nodes, a direct crossover connection would be ideal. Not sure how to accomplish that. How do you get certain services of the cluster environment to talk over 1 interface, and other services (such as the shares) over another? The only other interface I have configured is for the fence device (dell drac cards). > How big is your data store? Are files large or small? Are they in few directories with lots of files (e.g. Maildir)? Very much mixed. We have SAS and SATA in the same SAN device, and carved out based on application performance need. Some large volumes (7TB), some small (2GB). Some large files (video) down to the mix of millions of 1k user files. > Load averages will go up - that's normal, since there is added latency (round trip time) from locking across nodes. Unless your CPUs is 0% idle, the servers aren't running out of steam. So don't worry about it. Understood. That was just the measure I used as comparison. There is definite performance lag during these higher load averages. What I was trying (and doing poorly) to communicate was that all we are doing here is serving files over nfs..we're not running apps on the cluster itself...difficult for me to understand why file serving would be so slow or ever drive load up on a box that high. And, the old file server, did not have these performance issues doing the same tasks with less hardware, bandwith, etc. > Also note that a clustered FS will _ALWAYS_ be slower than a non-clustered one, all things being equal. No exceptions. Also, if you are load sharing across the nodes, and you have Maildir-like file structures, it'll go slower than a purely fail-over setup, even on a clustered FS (like GFS), since there is no lock bouncing between head nodes. For extra performance, you can use a non-clustered FS as a failover resource, but be very careful with that since dual mounting a non-clustered FS will destroy the volume firtually instantly. Agreed. That's not the comaprison though. Our old file server was running a clustered file system from Tru64 (AdvFS). Our expectation was that a newer technology, plus a major upgrade in hardware, would result in better performance at least than what we had, it has not, it is far worse. > Provided that your data isn't fundamentally unsuitable for being handled by a clustered load sharing setup, you could try increasing lock trimming and increasing the number of resource groups. Search through the archives for details on that. Can you point me in the direction of the archives? I can't seem to find them? > More suggestions when you provide more details on what your data is like. My apologies for the lack of detail, I'm a bit lost as to what to provide. It's basic files, large and small. User volumes, webserver volumes, postfix mail volumes, etc. Thanks so much! > Gordan > From gordan at bobich.net Fri Oct 3 17:29:33 2008 From: gordan at bobich.net (Gordan Bobic) Date: Fri, 03 Oct 2008 18:29:33 +0100 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223053011.22918.62.camel@thor.seas.smu.edu> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> Message-ID: <48E6567D.5020508@bobich.net> Doug Tucker wrote: >> Do you have a separate gigabit interface/vlan just for cluster >> communication? RHCS doesn't use a lot of sustained bandwidth but >> performance is sensitive to latencies for DLM comms. If you only have >> 2 nodes, a direct crossover connection would be ideal. > > Not sure how to accomplish that. How do you get certain services of the > cluster environment to talk over 1 interface, and other services (such > as the shares) over another? The only other interface I have configured > is for the fence device (dell drac cards). In your cluster.conf, make sure in the > How big is your data store? Are files large or small? Are they in >> few directories with lots of files (e.g. Maildir)? > > Very much mixed. We have SAS and SATA in the same SAN device, and > carved out based on application performance need. Some large volumes > (7TB), some small (2GB). Some large files (video) down to the mix of > millions of 1k user files. GFS copes OK with large files split across many separate directories. But if you are expecting to get fast random writes on files in the same directory, prepare to be disappointed. A write to a directory requires a directory lock, so concurrent writes to the same directory are going to have major performance issues. There isn't really any way to work around that, on any clustered FS. As long as there is no directory write contention, it should be OK, though. >> Load averages will go up - that's normal, since there is added latency >> (round trip time) from locking across nodes. Unless your CPUs is 0% idle, >> the servers aren't running out of steam. So don't worry about it. > > Understood. That was just the measure I used as comparison. There is > definite performance lag during these higher load averages. What I was > trying (and doing poorly) to communicate was that all we are doing here > is serving files over nfs..we're not running apps on the cluster > itself...difficult for me to understand why file serving would be so > slow or ever drive load up on a box that high. It sounds like you are seeing write contention. Make sure you mount everything with noatime,nodiratime,noquota, both from the GFS and from the NFS clients' side. Otherwise ever read will also require a write, and that'll kill any hope of getting decent performance out of the system. > And, the old file > server, did not have these performance issues doing the same tasks with > less hardware, bandwith, etc. I'm guessing the old server was standalone, rather than clustered? >> Also note that a clustered FS will _ALWAYS_ be slower than a non-clustered >> one, all things being equal. No exceptions. Also, if you are load >> sharingacross the nodes, and you have Maildir-like file structures, >> it'll go slower than a purely fail-over setup, even on a clustered >> FS (like GFS), since there is no lock bouncing between head nodes. >> For extra performance, you can use a non-clustered FS as a failover >> resource, but be very careful with that since dual mounting a >> non-clustered FS will destroy the volume firtually instantly. > > Agreed. That's not the comaprison though. Our old file server was > running a clustered file system from Tru64 (AdvFS). Our expectation was > that a newer technology, plus a major upgrade in hardware, would result > in better performance at least than what we had, it has not, it is far > worse. I see, so you had two servers in a load-sharing write-write configuration before, too? >> Provided that your data isn't fundamentally unsuitable for being >> handled by a clustered load sharing setup, you could try >> increasing lock trimming and increasing the number of resource >> groups. Search through the archives for details on that. > > Can you point me in the direction of the archives? I can't seem to find > them? Try here: http://www.mail-archive.com/search?l=linux-cluster%40redhat.com Look for gfs lock trimming and resource group related tuning. >> More suggestions when you provide more details on what your data is like. > > My apologies for the lack of detail, I'm a bit lost as to what to > provide. It's basic files, large and small. User volumes, webserver > volumes, postfix mail volumes, etc. The important thing is to: 1) reduce the number of concurrent writes to the same directory to the maximum extent possible. 2) reduce the number of unnecessary writes (noatime,nodiratime) All writes require locks to be bounced between the nodes, and this can add a significant overhead. If you set the nodes up in a fail-over configuration, and server all the traffic from the primary node, you may see the performance improve due to locks not being bounced around all the time, they'll get set on the master node and stay there until the master node fails and it's floating IP gets migrated to the other node. Gordan From macscr at macscr.com Fri Oct 3 17:48:39 2008 From: macscr at macscr.com (Mark Chaney) Date: Fri, 3 Oct 2008 12:48:39 -0500 Subject: [Linux-cluster] error messages explained In-Reply-To: <48E5CBF8.4000301@redhat.com> References: <02d501c924f8$f3eb4020$dbc1c060$@com> <48E5CBF8.4000301@redhat.com> Message-ID: <033b01c92580$464023e0$d2c06ba0$@com> When you say, need to join with the services running. What services do I need to start in order to do this manual join? Just cman? If a node crashes and cant rejoin. I have to hurry up (before its fenced again) and disable the auto start (chkconfig) of the following services: rgmanager, gfs, clvmd, and cman. Then reboot that node again? Then start cman and try to rejoin with just the cman_tool? The question is, if a server isn't part of a cluster anymore (aka, it was rebooted), the cluster obviously recognizes that disconnect and since the node was rebooted, it shouldn't even think its part of a cluster. So why in the world does anything think it is? All these manually changes after a simple node reboot or fencing just doesn't seem like a good design plan. I don't consider myself even moderately knowledgeable in this arena, I am just looking at this from a design perspective. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Christine Caulfield Sent: Friday, October 03, 2008 2:39 AM To: linux clustering Subject: Re: [Linux-cluster] error messages explained Mark Chaney wrote: > Cam someone explain to me these errors and tell me how I should attempt to > resolve them? They both aren't happening at the same time exactly, its just > to errors that I don't truly understand. > > #################### > > ccsd[3192]: Attempt to close an unopened CCS descriptor (13590). > ccsd[3192]: Error while processing disconnect: > Invalid request descriptor > > ################## > > openais[5453]: [MAIN ] Killing node ratchet.local because it has rejoined > the cluster with existing state > I need to add this to the FAQ! What this message means is that a node was a valid member of the cluster once; it then left the cluster (without being fenced) and rejoined automatically. This can sometimes happen if the ethernet is disconnected for a time, usually a few seconds. If a node leave the cluster, it MUST rejoin using the cman_tool join command with no services running. The usual way to make this happen is to reboot the node, and if fencing is configured correctly that is what normally happens. It could be that fencing is too slow to manage this or that the cluster is made up of two nodes without a quorum disk so that the 'other' node doesn't have quorum and cannot initiate fencing. Another (more common) cause of this, is slow responding of some Cisco switches as documented here: http://www.openais.org/doku.php?id=faq:cisco_switches -- Chrissie -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From tuckerd at engr.smu.edu Fri Oct 3 19:47:04 2008 From: tuckerd at engr.smu.edu (Doug Tucker) Date: Fri, 03 Oct 2008 14:47:04 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <48E6567D.5020508@bobich.net> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> Message-ID: <1223063224.22918.102.camel@thor.seas.smu.edu> Let me say first, I appreciate your help tremendously. Let me answer some questions, and then I need to go do some homework you have suggested. > In your cluster.conf, make sure in the > > > section is pointing at a private crossover IP of the node. Say you have > 2nd dedicated Gb interface for the clustering, assign it address, say > 10.0.0.1, and in the hosts file, have something like > > 10.0.0.1 node1c > 10.0.0.2 node2c > > That way each node in the cluster is referred to by it's cluster > interface name, and thus the cluster communication will go over that > dedicated interface. > I'm not sure I understand this correctly, please bear with me, are you saying the communication runs over the fenced interface? Or that the node name should reference a seperate nic that is private, and the exported virtual ip to the clients is done over the public interface? I'm confused, I thought that definition had to be the same as the hostname of the box? Here is what is in my conf file for reference: Where as engrfs1 and 2 are the actual hostnames of the boxes. > The fail-over resources (typically client-side IPs) remain as they are > on the client-side subnet. > It sounds like you are seeing write contention. Make sure you mount > everything with noatime,nodiratime,noquota, both from the GFS and from > the NFS clients' side. Otherwise ever read will also require a write, > and that'll kill any hope of getting decent performance out of the system. Already mounted noatime, will add nodiratime. Can't do noquota, we implement quotas for ever users here (5000 or so), and did so on the old file server. > > > I'm guessing the old server was standalone, rather than clustered? No, clustered, as I assume you realized below, just making sure it's clear. > > > I see, so you had two servers in a load-sharing write-write > configuration before, too? Certainly were capable of such. However here, as we did there, we set it up in more of a failover mode. We export a virtual ip attached to the nfs export, and all clients mount the vip, so whichever machine has the vip at a given time is "master" and gets all the traffic. The only exception to this is the backups that run at night, we do on the "secondary" machine directly, rather than using the vip. And the secondary is only there in the event of a failure to node1, when node1 comes back online, it is set up to fail back to node1. > > If you set the nodes up in a fail-over configuration, and server all the > traffic from the primary node, you may see the performance improve due > to locks not being bounced around all the time, they'll get set on the > master node and stay there until the master node fails and it's floating > IP gets migrated to the other node. As explained above, exactly how it is set up. Old file server the same way. We're basically completely scratching our heads in disbelief here to a large degree. No if/ands/buts about it, hardware wise, we have 500% more box than we used to have. Configuration architecture is virtually identical. Which leaves us with the software, which leaves us with only 2 conclusions we can come up with: 1) Tru64 and TruCluster with Advfs from 7 years ago is simply that much more robust and mature than RHES4 and CS/GFS and therefore tremendously outperforms it...or 2) We have this badly configured. > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From macscr at macscr.com Fri Oct 3 22:32:00 2008 From: macscr at macscr.com (Mark Chaney) Date: Fri, 3 Oct 2008 17:32:00 -0500 Subject: [Linux-cluster] fencing single server results in two servers fenced Message-ID: <036401c925a7$dbc29b60$9347d220$@com> I have a 3 node cluster. When I fence a single server (skydive.local) from wheeljack.local, I get the following results (2 servers clustered). I can duplicate this with any of my 3 servers. Why is the second server being fenced? ############################################## [root at wheeljack ~]# cman_tool nodes Node Sts Inc Joined Name 1 M 9580 2008-10-03 16:35:10 ratchet.local 2 M 9580 2008-10-03 16:35:10 skydive.local 3 M 9568 2008-10-03 16:35:10 wheeljack.local [root at wheeljack ~]# tail -f /var/log/messages Oct 3 16:35:15 wheeljack kernel: dlm: Using TCP for communications Oct 3 16:35:15 wheeljack kernel: dlm: got connection from 1 Oct 3 16:35:15 wheeljack kernel: dlm: got connection from 2 Oct 3 16:35:16 wheeljack clvmd: Cluster LVM daemon started - connected to CMAN Oct 3 16:35:16 wheeljack multipathd: dm-3: add map (uevent) Oct 3 16:35:16 wheeljack multipathd: dm-4: add map (uevent) Oct 3 16:35:16 wheeljack multipathd: dm-5: add map (uevent) Oct 3 16:35:16 wheeljack multipathd: dm-6: add map (uevent) Oct 3 16:35:16 wheeljack multipathd: dm-7: add map (uevent) Oct 3 16:35:17 wheeljack clurgmgrd[5407]: Resource Group Manager Starting Oct 3 16:37:15 wheeljack ntpd[3756]: synchronized to LOCAL(0), stratum 10 Oct 3 16:37:15 wheeljack ntpd[3756]: kernel time sync enabled 0001 Oct 3 16:38:19 wheeljack ntpd[3756]: synchronized to 66.79.149.35, stratum 2 Oct 3 16:42:42 wheeljack ntpd[3756]: synchronized to 64.202.112.65, stratum 2 Oct 3 16:44:05 wheeljack openais[5217]: [TOTEM] entering GATHER state from 12. Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] entering GATHER state from 11. Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] Saving state aru 52 high seq received 52 Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] Storing new sequence id for ring 2570 Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] entering COMMIT state. Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] entering RECOVERY state. Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] position [0] member 192.168.1.10: Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] previous ring seq 9580 rep 192.168.1.10 Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] aru 52 high delivered 52 received flag 1 Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] position [1] member 192.168.1.11: Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] previous ring seq 9580 rep 192.168.1.10 Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] aru 52 high delivered 52 received flag 1 Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] Did not need to originate any messages in recovery. Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] CLM CONFIGURATION CHANGE Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] New Configuration: Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] r(0) ip(192.168.1.10) Oct 3 16:44:10 wheeljack kernel: dlm: closing connection to node 2 Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] r(0) ip(192.168.1.11) Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] Members Left: Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] r(0) ip(192.168.1.12) Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] Members Joined: Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] CLM CONFIGURATION CHANGE Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] New Configuration: Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] r(0) ip(192.168.1.10) Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] r(0) ip(192.168.1.11) Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] Members Left: Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] Members Joined: Oct 3 16:44:10 wheeljack openais[5217]: [SYNC ] This node is within the primary component and will provide service. Oct 3 16:44:10 wheeljack openais[5217]: [TOTEM] entering OPERATIONAL state. Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] got nodejoin message 192.168.1.10 Oct 3 16:44:10 wheeljack openais[5217]: [CLM ] got nodejoin message 192.168.1.11 Oct 3 16:44:10 wheeljack openais[5217]: [CPG ] got joinlist message from node 3 Oct 3 16:44:10 wheeljack openais[5217]: [CPG ] got joinlist message from node 1 Oct 3 16:44:10 wheeljack fenced[5236]: fencing deferred to ratchet.local [root at wheeljack ~]# Message from syslogd@ at Fri Oct 3 16:48:12 2008 ... wheeljack clurgmgrd[5407]: #1: Quorum Dissolved Message from syslogd@ at Fri Oct 3 16:52:55 2008 ... wheeljack clurgmgrd[5407]: #1: Quorum Dissolved From orkcu at yahoo.com Sat Oct 4 00:14:06 2008 From: orkcu at yahoo.com (Roger Pena Escobio) Date: Fri, 3 Oct 2008 17:14:06 -0700 (PDT) Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223063224.22918.102.camel@thor.seas.smu.edu> Message-ID: <24956.47991.qm@web88307.mail.re4.yahoo.com> --- On Fri, 10/3/08, Doug Tucker wrote: > From: Doug Tucker > Subject: Re: [Linux-cluster] rhcs + gfs performance issues > To: "linux clustering" > Received: Friday, October 3, 2008, 3:47 PM > Let me say first, I appreciate your help tremendously. Let > me answer > some questions, and then I need to go do some homework you > have > suggested. > > > > In your cluster.conf, make sure in the > > > > > > > section is pointing at a private crossover IP of the > node. Say you have > > 2nd dedicated Gb interface for the clustering, assign > it address, say > > 10.0.0.1, and in the hosts file, have something like > > > > 10.0.0.1 node1c > > 10.0.0.2 node2c > > > > That way each node in the cluster is referred to by > it's cluster > > interface name, and thus the cluster communication > will go over that > > dedicated interface. > > > I'm not sure I understand this correctly, please bear > with me, are you > saying the communication runs over the fenced interface? > Or that the > node name should reference a seperate nic that is private, > and the > exported virtual ip to the clients is done over the public > interface? > I'm confused, I thought that definition had to be the > same as the > hostname of the box? Here is what is in my conf file for > reference: > > nodeid="1" votes="1"> > > name="1"> > modulename="" > name="engrfs1drac"/> > > > > name="engrfs2.seas.smu.edu" nodeid="2" > votes="1"> > > name="1"> > modulename="" > name="engrfs2drac"/> > > > > Where as engrfs1 and 2 are the actual hostnames of the > boxes. the cluster will do the hearbeat and internal communication through the interface that the "nodes" (declared in cluster.conf) are reachable that mean that you can use "internal" names (name declared in /etc/hosts in every node) just for the cluster communication > > > > I see, so you had two servers in a load-sharing > write-write > > configuration before, too? > Certainly were capable of such. However here, as we did > there, we set > it up in more of a failover mode. We export a virtual ip > attached to > the nfs export, and all clients mount the vip, so whichever > machine has > the vip at a given time is "master" and gets all > the traffic. The only > exception to this is the backups that run at night, we do > on the > "secondary" machine directly, rather than using > the vip. And the > secondary is only there in the event of a failure to node1, > when node1 > comes back online, it is set up to fail back to node1. > > > > > > If you set the nodes up in a fail-over configuration, > and server all the > > traffic from the primary node, you may see the > performance improve due > > to locks not being bounced around all the time, > they'll get set on the > > master node and stay there until the master node fails > and it's floating > > IP gets migrated to the other node. > As explained above, exactly how it is set up. Old file > server the same then I suggest you to define the service as failover (active-pasive) but the fs as GFS so you can mount it on-demand when you need to made the backup then umount it that way during normal workload the cluster nodes will not need to agree when reading/writing to the FS and, also you should measure the performance of the NFS standalone linux server compared to the tru64 nfs server, maybe the big performance degradation you are noticing is not the cluster layer thanks roger From ccaulfie at redhat.com Sat Oct 4 14:28:59 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Sat, 04 Oct 2008 15:28:59 +0100 Subject: [Linux-cluster] error messages explained In-Reply-To: <033b01c92580$464023e0$d2c06ba0$@com> References: <02d501c924f8$f3eb4020$dbc1c060$@com> <48E5CBF8.4000301@redhat.com> <033b01c92580$464023e0$d2c06ba0$@com> Message-ID: <48E77DAB.5000000@redhat.com> Mark Chaney wrote: > When you say, need to join with the services running. What services do I > need to start in order to do this manual join? Just cman? If a node crashes > and cant rejoin. I have to hurry up (before its fenced again) and disable > the auto start (chkconfig) of the following services: rgmanager, gfs, clvmd, > and cman. Then reboot that node again? Then start cman and try to rejoin > with just the cman_tool? > > The question is, if a server isn't part of a cluster anymore (aka, it was > rebooted), the cluster obviously recognizes that disconnect and since the > node was rebooted, it shouldn't even think its part of a cluster. So why in > the world does anything think it is? > > All these manually changes after a simple node reboot or fencing just > doesn't seem like a good design plan. I don't consider myself even > moderately knowledgeable in this arena, I am just looking at this from a > design perspective. > I think you have misunderstood my. The point is that if a node leaves the cluster it really should be rebooted and join the cluster cleanly that way. There is no manual involvement at all. That's what the init scripts are for and why they are run at startup. > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Christine Caulfield > Sent: Friday, October 03, 2008 2:39 AM > To: linux clustering > Subject: Re: [Linux-cluster] error messages explained > > Mark Chaney wrote: >> Cam someone explain to me these errors and tell me how I should attempt to >> resolve them? They both aren't happening at the same time exactly, its > just >> to errors that I don't truly understand. >> >> #################### >> >> ccsd[3192]: Attempt to close an unopened CCS descriptor (13590). >> ccsd[3192]: Error while processing disconnect: >> Invalid request descriptor >> >> ################## >> >> openais[5453]: [MAIN ] Killing node ratchet.local because it has rejoined >> the cluster with existing state >> > > I need to add this to the FAQ! > > What this message means is that a node was a valid member of the cluster > once; it then left the cluster (without being fenced) and rejoined > automatically. This can sometimes happen if the ethernet is disconnected > for a time, usually a few seconds. > > If a node leave the cluster, it MUST rejoin using the cman_tool join > command with no services running. The usual way to make this happen is > to reboot the node, and if fencing is configured correctly that is what > normally happens. It could be that fencing is too slow to manage this or > that the cluster is made up of two nodes without a quorum disk so that > the 'other' node doesn't have quorum and cannot initiate fencing. > > Another (more common) cause of this, is slow responding of some Cisco > switches as documented here: > > http://www.openais.org/doku.php?id=faq:cisco_switches > > -- Chrissie From gordan at bobich.net Sat Oct 4 17:32:58 2008 From: gordan at bobich.net (Gordan Bobic) Date: Sat, 04 Oct 2008 18:32:58 +0100 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223063224.22918.102.camel@thor.seas.smu.edu> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> Message-ID: <48E7A8CA.1000502@bobich.net> Doug Tucker wrote: >> In your cluster.conf, make sure in the >> >> > >> section is pointing at a private crossover IP of the node. Say you have >> 2nd dedicated Gb interface for the clustering, assign it address, say >> 10.0.0.1, and in the hosts file, have something like >> >> 10.0.0.1 node1c >> 10.0.0.2 node2c >> >> That way each node in the cluster is referred to by it's cluster >> interface name, and thus the cluster communication will go over that >> dedicated interface. >> > I'm not sure I understand this correctly, please bear with me, are you > saying the communication runs over the fenced interface? No, over a dedicated, separate interface. > Or that the > node name should reference a seperate nic that is private, and the > exported virtual ip to the clients is done over the public interface? That's the one. > I'm confused, I thought that definition had to be the same as the > hostname of the box? No. The floating IPs will get assigned to whatever interface has the IP on that subnet. The cluster/DLM comms interface is inferred by the node name. > Here is what is in my conf file for reference: > > > > > name="engrfs1drac"/> > > > > votes="1"> > > > name="engrfs2drac"/> > > > > Where as engrfs1 and 2 are the actual hostnames of the boxes. Add another NIC in, give it a private IP/subnet, and put it in the hosts file on both nodes as something like engrfs1-cluster.seas.smu.edu, and put that in the clusternode name entry. >> The fail-over resources (typically client-side IPs) remain as they are >> on the client-side subnet. > >> It sounds like you are seeing write contention. Make sure you mount >> everything with noatime,nodiratime,noquota, both from the GFS and from >> the NFS clients' side. Otherwise ever read will also require a write, >> and that'll kill any hope of getting decent performance out of the system. > > Already mounted noatime, will add nodiratime. Can't do noquota, we > implement quotas for ever users here (5000 or so), and did so on the old > file server. > >> I'm guessing the old server was standalone, rather than clustered? > > No, clustered, as I assume you realized below, just making sure it's > clear. OK, noted. >> I see, so you had two servers in a load-sharing write-write >> configuration before, too? > > Certainly were capable of such. However here, as we did there, we set > it up in more of a failover mode. We export a virtual ip attached to > the nfs export, and all clients mount the vip, so whichever machine has > the vip at a given time is "master" and gets all the traffic. The only > exception to this is the backups that run at night, we do on the > "secondary" machine directly, rather than using the vip. And the > secondary is only there in the event of a failure to node1, when node1 > comes back online, it is set up to fail back to node1. OK, that should be fine, although you may find there's less of a performance hit if you do the backup from the master node, too, as that'll already have the locks on all the files. >> If you set the nodes up in a fail-over configuration, and server all the >> traffic from the primary node, you may see the performance improve due >> to locks not being bounced around all the time, they'll get set on the >> master node and stay there until the master node fails and it's floating >> IP gets migrated to the other node. > > As explained above, exactly how it is set up. Old file server the same > way. We're basically completely scratching our heads in disbelief here > to a large degree. No if/ands/buts about it, hardware wise, we have > 500% more box than we used to have. Configuration architecture is > virtually identical. Which leaves us with the software, which leaves us > with only 2 conclusions we can come up with: > > 1) Tru64 and TruCluster with Advfs from 7 years ago is simply that much > more robust and mature than RHES4 and CS/GFS and therefore tremendously > outperforms it...or RHEL4 is quite old. It's been a while since I used it for clustering. RHEL5 has yielded considerably better performance in my experience. > 2) We have this badly configured. There isn't all that much to tune on RHEL4 cluster-wise, most of the tweakability has been added more recently than I've last used it. I'd say RHEL5 is certainly worth trying. The problem you are having may just go away. Gordan From macscr at macscr.com Sat Oct 4 17:38:13 2008 From: macscr at macscr.com (Mark Chaney) Date: Sat, 4 Oct 2008 12:38:13 -0500 Subject: [Linux-cluster] error messages explained In-Reply-To: <48E77DAB.5000000@redhat.com> References: <02d501c924f8$f3eb4020$dbc1c060$@com> <48E5CBF8.4000301@redhat.com> <033b01c92580$464023e0$d2c06ba0$@com> <48E77DAB.5000000@redhat.com> Message-ID: <03a901c92647$fb81efa0$f285cee0$@com> Unfortunately simply rebooting has never resolved those errors. =/. I am getting these errors after a server is fenced and is rebooted. Then its fenced again, still same errors. I basically have to shutdown the entire cluster manually, reboot with all init scripts off, then have manually start all cluster services and add the services back to chkconfig. This is basically the process I have to do 95% of the time when a single server is fenced. =/ -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Christine Caulfield Sent: Saturday, October 04, 2008 9:29 AM To: linux clustering Subject: Re: [Linux-cluster] error messages explained Mark Chaney wrote: > When you say, need to join with the services running. What services do I > need to start in order to do this manual join? Just cman? If a node crashes > and cant rejoin. I have to hurry up (before its fenced again) and disable > the auto start (chkconfig) of the following services: rgmanager, gfs, clvmd, > and cman. Then reboot that node again? Then start cman and try to rejoin > with just the cman_tool? > > The question is, if a server isn't part of a cluster anymore (aka, it was > rebooted), the cluster obviously recognizes that disconnect and since the > node was rebooted, it shouldn't even think its part of a cluster. So why in > the world does anything think it is? > > All these manually changes after a simple node reboot or fencing just > doesn't seem like a good design plan. I don't consider myself even > moderately knowledgeable in this arena, I am just looking at this from a > design perspective. > I think you have misunderstood my. The point is that if a node leaves the cluster it really should be rebooted and join the cluster cleanly that way. There is no manual involvement at all. That's what the init scripts are for and why they are run at startup. > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Christine Caulfield > Sent: Friday, October 03, 2008 2:39 AM > To: linux clustering > Subject: Re: [Linux-cluster] error messages explained > > Mark Chaney wrote: >> Cam someone explain to me these errors and tell me how I should attempt to >> resolve them? They both aren't happening at the same time exactly, its > just >> to errors that I don't truly understand. >> >> #################### >> >> ccsd[3192]: Attempt to close an unopened CCS descriptor (13590). >> ccsd[3192]: Error while processing disconnect: >> Invalid request descriptor >> >> ################## >> >> openais[5453]: [MAIN ] Killing node ratchet.local because it has rejoined >> the cluster with existing state >> > > I need to add this to the FAQ! > > What this message means is that a node was a valid member of the cluster > once; it then left the cluster (without being fenced) and rejoined > automatically. This can sometimes happen if the ethernet is disconnected > for a time, usually a few seconds. > > If a node leave the cluster, it MUST rejoin using the cman_tool join > command with no services running. The usual way to make this happen is > to reboot the node, and if fencing is configured correctly that is what > normally happens. It could be that fencing is too slow to manage this or > that the cluster is made up of two nodes without a quorum disk so that > the 'other' node doesn't have quorum and cannot initiate fencing. > > Another (more common) cause of this, is slow responding of some Cisco > switches as documented here: > > http://www.openais.org/doku.php?id=faq:cisco_switches > > -- Chrissie -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From d.vasilets at peterhost.ru Mon Oct 6 07:22:09 2008 From: d.vasilets at peterhost.ru (=?koi8-r?Q?=F7=C1=D3=C9=CC=C5=C3_?= =?koi8-r?Q?=E4=CD=C9=D4=D2=C9=CA?=) Date: Mon, 06 Oct 2008 11:22:09 +0400 Subject: [Linux-cluster] where i can find define of "volume_id_get_type" Message-ID: <1223277729.10374.2.camel@dima-desktop> i try compile last version of cluster package have error " undefined reference to `volume_id_get_type'" where i can find define of this function ? From edoardo.causarano at laitspa.it Mon Oct 6 09:00:27 2008 From: edoardo.causarano at laitspa.it (Edoardo Causarano) Date: Mon, 6 Oct 2008 11:00:27 +0200 Subject: [Linux-cluster] "gfs" init script configuration In-Reply-To: References: Message-ID: <1223283627.6492.1.camel@ecausarano-laptop> Hi all, can anyone help me on this issue I mentioned in my pevious email? e On mer, 2008-10-01 at 19:18 +0200, Edoardo Causarano wrote: > Hi all, > > > > further investigation shows that my gfs stalling on reboot is due to > incorrect specification of the filesystem in /ec/fstab. I mount is as > _netdev so /etc/init.d/gfs won?t pick it up. > > > > What is the correct syntax to make sure /etc/init.d/gfs will pick up > the fs at the right time during shutdown (before tearing down > scsi_reservation and clustering)? > > > > IE? can I peek at your fstabs? ;) > > > > E > > (excuse me for the outlook mail) > > > Documento in testo semplice attachment (ATT265888.txt) > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From swhiteho at redhat.com Mon Oct 6 09:20:45 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Mon, 06 Oct 2008 10:20:45 +0100 Subject: [Linux-cluster] "gfs" init script configuration In-Reply-To: <1223283627.6492.1.camel@ecausarano-laptop> References: <1223283627.6492.1.camel@ecausarano-laptop> Message-ID: <1223284845.3540.7.camel@localhost.localdomain> Hi, Please see bugs #207697, #246933, #435906 and #435945. Also #456476 is the same bug but for FUSE. So you might need to write your own script to solve the problem. You can kind of use _netdev, but it does have limitations and there is currently no way to solve the shutdown problem for filesystems which were not mounted by the initscripts, Steve. On Mon, 2008-10-06 at 11:00 +0200, Edoardo Causarano wrote: > Hi all, > > can anyone help me on this issue I mentioned in my pevious email? > > e > > > On mer, 2008-10-01 at 19:18 +0200, Edoardo Causarano wrote: > > Hi all, > > > > > > > > further investigation shows that my gfs stalling on reboot is due to > > incorrect specification of the filesystem in /ec/fstab. I mount is as > > _netdev so /etc/init.d/gfs won?t pick it up. > > > > > > > > What is the correct syntax to make sure /etc/init.d/gfs will pick up > > the fs at the right time during shutdown (before tearing down > > scsi_reservation and clustering)? > > > > > > > > IE? can I peek at your fstabs? ;) > > > > > > > > E > > > > (excuse me for the outlook mail) > > > > > > Documento in testo semplice attachment (ATT265888.txt) > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From tuckerd at engr.smu.edu Mon Oct 6 15:21:42 2008 From: tuckerd at engr.smu.edu (Doug Tucker) Date: Mon, 06 Oct 2008 10:21:42 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <48E7A8CA.1000502@bobich.net> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> Message-ID: <1223306502.29679.12.camel@thor.seas.smu.edu> > > 1) Tru64 and TruCluster with Advfs from 7 years ago is simply that much > > more robust and mature than RHES4 and CS/GFS and therefore tremendously > > outperforms it...or > > RHEL4 is quite old. It's been a while since I used it for clustering. > RHEL5 has yielded considerably better performance in my experience. Interesting. The only way I can see upgrading is: 1) using the upgrade options on the production node (scary, and downtime) or 2) do you know if RHEL5 will participate in a RHEL4 cluster? if so, I could add an RHEL5 node, make it master, and then have the ability to upgrade the other 2 one at a time. > > > 2) We have this badly configured. > > There isn't all that much to tune on RHEL4 cluster-wise, most of the > tweakability has been added more recently than I've last used it. I'd > say RHEL5 is certainly worth trying. The problem you are having may just > go away. Bummer! > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From michael.osullivan at auckland.ac.nz Mon Oct 6 15:28:22 2008 From: michael.osullivan at auckland.ac.nz (michael.osullivan at auckland.ac.nz) Date: Tue, 7 Oct 2008 04:28:22 +1300 (NZDT) Subject: [Linux-cluster] Can't create LV in 2-node cluster Message-ID: <1132.128.187.153.180.1223306902.squirrel@mail.esc.auckland.ac.nz> Hi everyone, I have created a 2-node cluster and (after a little difficulty) created a clustered volume group visible on both nodes. However, I can't create a logical volume on either node. I get the following error: Error locking on node : Error backing up metadata, can't find VG for group #global Aborting. Failed to activate new LV to wipe the start of it. I have locking_type = 3 and locking_library = liblvm2clusterlock.so. The metadata areas and sequence nos are the same using vgdisplay on both nodes. Can anyone help me create LVs? I am happy to provide any extra info needed. Thanks, Mike From jeff.sturm at eprize.com Mon Oct 6 16:01:32 2008 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Mon, 6 Oct 2008 12:01:32 -0400 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223306502.29679.12.camel@thor.seas.smu.edu> References: <48BF35C50600251B@> (added bypostmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu><48E6567D.5020508@bobich.net><1223063224.22918.102.camel@thor.seas.smu.edu><48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> Message-ID: <64D0546C5EBBD147B75DE133D798665F01806522@hugo.eprize.local> > 2) do you know if RHEL5 will participate in a RHEL4 cluster? Not with incompatible lock modules (DLM vs. GULM). Sorry. I don't beleve there's a way to upgrade while the cluster is online. From macscr at macscr.com Mon Oct 6 16:03:57 2008 From: macscr at macscr.com (Mark Chaney) Date: Mon, 6 Oct 2008 11:03:57 -0500 Subject: [Linux-cluster] Can't create LV in 2-node cluster In-Reply-To: <1132.128.187.153.180.1223306902.squirrel@mail.esc.auckland.ac.nz> References: <1132.128.187.153.180.1223306902.squirrel@mail.esc.auckland.ac.nz> Message-ID: <04cb01c927cd$24b23ed0$6e16bc70$@com> What are you using for shared storage? Also, 2 node clusters are highly discouraged from my experience and recommendations of others. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of michael.osullivan at auckland.ac.nz Sent: Monday, October 06, 2008 10:28 AM To: linux-cluster at redhat.com Subject: [Linux-cluster] Can't create LV in 2-node cluster Hi everyone, I have created a 2-node cluster and (after a little difficulty) created a clustered volume group visible on both nodes. However, I can't create a logical volume on either node. I get the following error: Error locking on node : Error backing up metadata, can't find VG for group #global Aborting. Failed to activate new LV to wipe the start of it. I have locking_type = 3 and locking_library = liblvm2clusterlock.so. The metadata areas and sequence nos are the same using vgdisplay on both nodes. Can anyone help me create LVs? I am happy to provide any extra info needed. Thanks, Mike -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From mulach at libero.it Mon Oct 6 17:04:23 2008 From: mulach at libero.it (mulach) Date: Mon, 6 Oct 2008 19:04:23 +0200 Subject: R: [Linux-cluster] Cluster Centos - Don't switch resource In-Reply-To: <200809221730.45084.xavier.montagutelli@unilim.fr> References: <200809221730.45084.xavier.montagutelli@unilim.fr> Message-ID: <44C2C1BCF1DA4FBBAE543EFE9F4CB6AD@invitto> It works fine, but i have a doubt: I must install heartbeat? If not in wich case i do it? Tnks -----Messaggio originale----- Da: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] Per conto di Xavier Montagutelli Inviato: luned? 22 settembre 2008 17.31 A: linux clustering Oggetto: Re: [Linux-cluster] Cluster Centos - Don't switch resource On Monday 22 September 2008 08:56, mulach at libero.it wrote: > Hi, > > in Centos 5.2 i had create a cluster with Conga(two node has been in vmware > server). The problem is that when a node fail, don't switch the service. > Below the cluster.conf > > -------------------------------- > > > > > > > > > > > > > > > If I understand correctly your cluster.conf file, you don't have fencing devices for your nodes. In my tests, I **had** to define fencing methods for the service to switch when one node fails. Otherwise, it doesn't work ! You can try configuring a "fence_manual" method (just for testing). After clu1 failure, clu2 should "fence" it. You then have to use the "fence_ack_manual -n clu1.localdomain" command to confirm manually that the fencing is done. http://sources.redhat.com/cluster/wiki/FAQ/Fencing#fence_manual2 In my cluster.conf, it looks like : (same for clu2) Does it solve your pb ? > > > > > > > > > > > > fstype="ext3" mountpoint="/mnt/sdc" name="Share" options="" > self_fence="0"/> > recovery="restart"> > > > > > > > -------------------------------- > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Xavier Montagutelli Tel : +33 (0)5 55 45 77 20 Service Commun Informatique Fax : +33 (0)5 55 45 75 95 Universite de Limoges 123, avenue Albert Thomas 87060 Limoges cedex -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.169 / Virus Database: 270.7.0/1683 - Release Date: 21/09/2008 10.10 From gordan at bobich.net Mon Oct 6 17:45:00 2008 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 06 Oct 2008 18:45:00 +0100 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223306502.29679.12.camel@thor.seas.smu.edu> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> Message-ID: <48EA4E9C.7070901@bobich.net> Doug Tucker wrote: >>> 1) Tru64 and TruCluster with Advfs from 7 years ago is simply that much >>> more robust and mature than RHES4 and CS/GFS and therefore tremendously >>> outperforms it...or >> RHEL4 is quite old. It's been a while since I used it for clustering. >> RHEL5 has yielded considerably better performance in my experience. > > Interesting. The only way I can see upgrading is: > > 1) using the upgrade options on the production node (scary, and > downtime) or "Upgrade" options never really worked on any OS. I wouldn't bank on it "just working", especially on something as complex as clustering. > 2) do you know if RHEL5 will participate in a RHEL4 cluster? if so, I > could add an RHEL5 node, make it master, and then have the ability to > upgrade the other 2 one at a time. No, you can't mix versions. Even different package versions between the same distro version can cause problems. >>> 2) We have this badly configured. >> There isn't all that much to tune on RHEL4 cluster-wise, most of the >> tweakability has been added more recently than I've last used it. I'd >> say RHEL5 is certainly worth trying. The problem you are having may just >> go away. > > Bummer! Worse, you may need to change the GFS file system options for the new version, so you may end up having to backup/restore the data. I don't think you can avoid cluster downtime for the upgrade. Gordan From tuckerd at engr.smu.edu Mon Oct 6 18:34:23 2008 From: tuckerd at engr.smu.edu (Doug Tucker) Date: Mon, 06 Oct 2008 13:34:23 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <48EA4E9C.7070901@bobich.net> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> Message-ID: <1223318063.29679.20.camel@thor.seas.smu.edu> O > Worse, you may need to change the GFS file system options for the new > version, so you may end up having to backup/restore the data. I don't > think you can avoid cluster downtime for the upgrade. Then upgrading is not an option. > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From fdinitto at redhat.com Mon Oct 6 18:54:57 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 6 Oct 2008 20:54:57 +0200 (CEST) Subject: [Linux-cluster] can't compile cluster-2.03.07 In-Reply-To: <1223040639.10584.1.camel@dima-desktop> References: <1223040639.10584.1.camel@dima-desktop> Message-ID: On Fri, 3 Oct 2008, ??????? ??????? wrote: > I try compile cluster-2.03.07 with kernel 2.6.26-5 > how i can fix that ? > every time report > make[2]: Entering directory `/usr/src/kernels/linux-2.6.26.5' > > WARNING: Symbol version > dump /usr/src/kernels/linux-2.6.26.5/Module.symvers > is missing; modules will have no dependencies and > modversions. > > Building modules, stage 2. > MODPOST 1 modules > /bin/sh: scripts/mod/modpost: No such file or directory > make[3]: *** [__modpost] Error 127 > make[2]: *** [modules] Error 2 > make[2]: Leaving directory `/usr/src/kernels/linux-2.6.26.5' > make[1]: *** [gnbd.ko] Error 2 > make[1]: Leaving directory `/root/gfs/cluster/gnbd-kernel/src' > make: *** [gnbd-kernel/src] Error 2 These messages are spawn for different reasons. You either need to: - install the kernel headers for the running kernel (depending on the distribution it might change) - if you are using a custom kernel, you need to configure it and prepare the tree. - configure --kernel_src and --kernel_build to point to the right locations. Fabio -- I'm going to make him an offer he can't refuse. From fdinitto at redhat.com Mon Oct 6 18:55:45 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 6 Oct 2008 20:55:45 +0200 (CEST) Subject: [Linux-cluster] where i can find define of "volume_id_get_type" In-Reply-To: <1223277729.10374.2.camel@dima-desktop> References: <1223277729.10374.2.camel@dima-desktop> Message-ID: On Mon, 6 Oct 2008, ??????? ??????? wrote: > i try compile last version of cluster package > have error " undefined reference to `volume_id_get_type'" > where i can find define of this function ? libvolume_id Fabio -- I'm going to make him an offer he can't refuse. From gordan at bobich.net Mon Oct 6 19:10:58 2008 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 06 Oct 2008 20:10:58 +0100 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223318063.29679.20.camel@thor.seas.smu.edu> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> Message-ID: <48EA62C2.8070002@bobich.net> Doug Tucker wrote: > O >> Worse, you may need to change the GFS file system options for the new >> version, so you may end up having to backup/restore the data. I don't >> think you can avoid cluster downtime for the upgrade. > > Then upgrading is not an option. And reverting back to the Tru64/Alpha system is? Gordan From tuckerd at engr.smu.edu Mon Oct 6 20:05:13 2008 From: tuckerd at engr.smu.edu (Doug Tucker) Date: Mon, 06 Oct 2008 15:05:13 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <48EA62C2.8070002@bobich.net> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> <48EA62C2.8070002@bobich.net> Message-ID: <1223323513.29679.31.camel@thor.seas.smu.edu> > > And reverting back to the Tru64/Alpha system is? Nope, completely out of drive space on that one. I'm basically stuck where I'm at with it underperforming. > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From gordan at bobich.net Mon Oct 6 20:44:02 2008 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 06 Oct 2008 21:44:02 +0100 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223323513.29679.31.camel@thor.seas.smu.edu> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> <48EA62C2.8070002@bobich.net> <1223323513.29679.31.camel@thor.seas.smu.edu> Message-ID: <48EA7892.5030902@bobich.net> Doug Tucker wrote: >> And reverting back to the Tru64/Alpha system is? > > Nope, completely out of drive space on that one. I'm basically stuck > where I'm at with it underperforming. I don't mean to teach you to suck eggs, so please don't take this as patronizing, 'cause that's not my intention in any way shape or form, but since 1TB SATA disks go for around $160, could you not just plug a couple of those in as scratch space for the migration? Gordan From Dave.Jones at maritz.com Mon Oct 6 21:10:21 2008 From: Dave.Jones at maritz.com (Jones, Dave) Date: Mon, 6 Oct 2008 16:10:21 -0500 Subject: [Linux-cluster] Recommended 120v & 240v power switch In-Reply-To: <48EA7892.5030902@bobich.net> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> <48EA62C2.8070002@bobich.net><1223323513.29679.31.camel@thor.seas.smu.edu> <48EA7892.5030902@bobich.net> Message-ID: Hello all. Does anyone have a recommended power switch that works well for power fencing and supports 120v and 240v? Thanks, Dave Confidentiality Warning: This e-mail contains information intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, any dissemination, publication or copying of this e-mail is strictly prohibited. The sender does not accept any responsibility for any loss, disruption or damage to your data or computer system that may occur while using data contained in, or transmitted with, this e-mail. If you have received this e-mail in error, please immediately notify us by return e-mail. Thank you. From Dave.Jones at maritz.com Mon Oct 6 21:25:35 2008 From: Dave.Jones at maritz.com (Jones, Dave) Date: Mon, 6 Oct 2008 16:25:35 -0500 Subject: [Linux-cluster] Recommended 120v & 240v power switch In-Reply-To: References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> <48EA62C2.8070002@bobich.net><1223323513.29679.31.camel@thor.seas.smu.edu><48EA7892.5030902@bobich.net> Message-ID: One more thing - Barring a dual-voltage device, which 208v device does everyone prefer? Thanks, D -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jones, Dave Sent: Monday, October 06, 2008 4:10 PM To: linux clustering Subject: [Linux-cluster] Recommended 120v & 240v power switch Hello all. Does anyone have a recommended power switch that works well for power fencing and supports 120v and 240v? Thanks, Dave Confidentiality Warning: This e-mail contains information intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, any dissemination, publication or copying of this e-mail is strictly prohibited. The sender does not accept any responsibility for any loss, disruption or damage to your data or computer system that may occur while using data contained in, or transmitted with, this e-mail. If you have received this e-mail in error, please immediately notify us by return e-mail. Thank you. From jparsons at redhat.com Mon Oct 6 21:37:50 2008 From: jparsons at redhat.com (jim parsons) Date: Mon, 06 Oct 2008 17:37:50 -0400 Subject: [Linux-cluster] Recommended 120v & 240v power switch In-Reply-To: References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> <48EA62C2.8070002@bobich.net><1223323513.29679.31.camel@thor.seas.smu.edu> <48EA7892.5030902@bobich.net> Message-ID: <1223329070.3266.10.camel@localhost.localdomain> On Mon, 2008-10-06 at 16:25 -0500, Jones, Dave wrote: > One more thing - > > Barring a dual-voltage device, which 208v device does everyone prefer? > > Thanks, > D > Take a look at apc AP7911, AP7921, or AP7940, depending on current needs and number of outlets. WTI makes nice switches, too. I do not know their product numbers off hand though. -j From tuckerd at engr.smu.edu Mon Oct 6 21:41:20 2008 From: tuckerd at engr.smu.edu (Doug Tucker) Date: Mon, 06 Oct 2008 16:41:20 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <48EA7892.5030902@bobich.net> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> <48EA62C2.8070002@bobich.net> <1223323513.29679.31.camel@thor.seas.smu.edu> <48EA7892.5030902@bobich.net> Message-ID: <1223329280.29679.50.camel@thor.seas.smu.edu> > > I don't mean to teach you to suck eggs, so please don't take this as > patronizing, 'cause that's not my intention in any way shape or form, > but since 1TB SATA disks go for around $160, could you not just plug a > couple of those in as scratch space for the migration? I can't see a way around some significant downtime even with that, and there is no way they will give me the option to be down from a planned perspective. > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From orkcu at yahoo.com Mon Oct 6 22:25:00 2008 From: orkcu at yahoo.com (Roger Pena Escobio) Date: Mon, 6 Oct 2008 15:25:00 -0700 (PDT) Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223323513.29679.31.camel@thor.seas.smu.edu> Message-ID: <690445.74676.qm@web88308.mail.re4.yahoo.com> --- On Mon, 10/6/08, Doug Tucker wrote: > From: Doug Tucker > Subject: Re: [Linux-cluster] rhcs + gfs performance issues > To: "linux clustering" > Received: Monday, October 6, 2008, 4:05 PM > > > > And reverting back to the Tru64/Alpha system is? > > Nope, completely out of drive space on that one. I'm > basically stuck > where I'm at with it underperforming. did you check if the standalone NFS server also under perform? just to rule out the cluster layer in the "under perform" equation if it is not, then a pasive-active configuration for the cluster could give you what you want, but still using GFS filesystem so you can mount it simultaneous on the demand (for backup). thanks roger From gordan at bobich.net Mon Oct 6 22:33:53 2008 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 06 Oct 2008 23:33:53 +0100 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <1223329280.29679.50.camel@thor.seas.smu.edu> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> <48EA62C2.8070002@bobich.net> <1223323513.29679.31.camel@thor.seas.smu.edu> <48EA7892.5030902@bobich.net> <1223329280.29679.50.camel@thor.seas.smu.edu> Message-ID: <48EA9251.40606@bobich.net> Doug Tucker wrote: >> I don't mean to teach you to suck eggs, so please don't take this as >> patronizing, 'cause that's not my intention in any way shape or form, >> but since 1TB SATA disks go for around $160, could you not just plug a >> couple of those in as scratch space for the migration? > > I can't see a way around some significant downtime even with that, and > there is no way they will give me the option to be down from a planned > perspective. So, out of nowhere straight into production, without performance user acceptance testing period? And they won't allow any planned downtime? My mind boggles. Good luck. Gordan From linux-cluster at merctech.com Mon Oct 6 23:59:01 2008 From: linux-cluster at merctech.com (linux-cluster at merctech.com) Date: Mon, 06 Oct 2008 19:59:01 -0400 Subject: [Linux-cluster] help wanted configuring services Message-ID: <602.1223337541@mirchi> Is it possible to set up a hierarchy of services, in the same way that a service is made up of individual resources? I'm trying to set up a clustered web server using RHCS 5 on CentOS 5.2, with the latest versions from the "upstream provider's" production release. The cluster provides several web applications, each of which has it's own resources that Apache doesn't need to know about directly. Each application relies on Apache as a presentation layer. I don't want to make the individual applications' dependencies separate resources within a single "Apache" service, because that makes management difficult. For example, is it possible to create a structure like: Service: Apache Private Resource: IP address Private Resource: GFS Vol1 Shared Service: Wiki Shared Service: SVN Shared Service: Calendar Service: Wiki Private Resource: GFS Vol2 Private Resource: MySQL "wiki" instance Service: SVN Private Resource: MySQL "svn" instance Private Resource: GFS Vol3 Service: Calendar (no resources beyond apache) Note that resources like "GFS Vol2" and "MySQL wiki instance" are assigned to the "Wiki" service, not directly to the "Apache" service. The Apache service sees "Wiki" as a resource. With this kind of structure, I could administratively disable the "Wiki" service without causing the other web applications to restart or relocate. However, if a resource that the Wiki requires fails on the active web server (for example, GFS Vol 2), the the standard failover policy would apply...the Wiki service would restart and if that is unsuccessful, then the Apache service (and it's dependencies--Wiki, SVN, Calendar, IP address, GFS Vol1) would relocate. Is there any way to set up this kind of structure with RHCS? Thanks, Mark ----- Mark Bergman http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com From stuarta at squashedfrog.net Tue Oct 7 08:46:26 2008 From: stuarta at squashedfrog.net (Stuart Auchterlonie) Date: Tue, 07 Oct 2008 09:46:26 +0100 Subject: [Linux-cluster] Cisco working configuration In-Reply-To: <20081001211109.GA11341@aaron> References: <2000762384.122241222888753010.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> <620016859.122671222888921460.JavaMail.root@zmail02.collab.prod.int.phx2.redhat.com> <20081001211109.GA11341@aaron> Message-ID: <48EB21E2.70803@squashedfrog.net> Jakub Suchy wrote: > Leo Pleiman wrote: >> The kbase article can be found at http://kbase.redhat.com/faq/FAQ_51_11755.shtm >> It has a link to Cisco's web site enumerating 5 possible solutions. http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008059a9df.shtml > > Hello, > I am aware of these documents and I have tried all these solutions. > We had to turn on 'ip igmp snooping querier' as documented in the link on the cisco website above, and it worked okay.. Stuart From david at craigon.co.uk Tue Oct 7 10:34:07 2008 From: david at craigon.co.uk (David) Date: Tue, 07 Oct 2008 11:34:07 +0100 Subject: [Linux-cluster] My patch Message-ID: <48EB3B1F.1010700@craigon.co.uk> Did this patch ever get merged in? https://www.redhat.com/archives/linux-cluster/2008-August/msg00026.html From s.wendy.cheng at gmail.com Tue Oct 7 13:57:20 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 07 Oct 2008 08:57:20 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <690445.74676.qm@web88308.mail.re4.yahoo.com> References: <690445.74676.qm@web88308.mail.re4.yahoo.com> Message-ID: <48EB6AC0.6070100@gmail.com> Hopefully the following provide some relieves ... 1. Enable lock trimming tunable. It is particularly relevant if NFS-GFS is used by development type of workloads (editing, compiling, build, etc) and/or after filesystem backup. Unlike fast statfs, this tunable is per-node base (you don't need to have the same value on each of the nodes and a mix of on-off within the same cluster is ok). Make the trimming very aggressive on backup node (> 50% where you run backup) and moderate on your active node (< 50%). Try to experiment with different values to fit the workload. Googling "gfs lock trimming wcheng" to pick up the technical background if needed. shell> gfs_tool settune glock_purge (e.g. gfs_tool settune /mnt/gfs1 glock_purge 50) 2. Turn on readahead tunable. It is effective for large file (stream IO) read performance. As I recalled, there was a cluster (with IPTV application) used val=256 for 400-500M files. Another one with 2G file size used val=2048. Again, it is per-node base so different values are ok on different nodes. shell> gfs_tool settune seq_readahead (e.g. gfs_tool settune /mnt/gfs1 seq_readahead 2048) 3. Fast statfs tunable - you have this on already ? Make sure they need to be same across cluster nodes. 4. Understand the risks and performance implications of NFS server's "async" vs. "sync" options. Linux NFS server "sync" options are controlled by two different mechanisms - mount and export. By default, mount is "aysnc" and export is "sync". Even with specific "async" mount request, Linux server uses "sync" export as default that is particularly troublesome for gfs. I don't plan to give an example and/or suggest the exact export option here - hopefully this will force folks to do more researches to fully understand the ramifications between performance and data liability. Most of the proprietary NFS servers in the market today utilize hardware features to relieve this performance and data integrity conflicts. Mainline linux servers (and RHEL) are totally software-base so it generally has problem in this regard. Gfs1 in general doesn't do well in "sync" performance (journal layer is too bulky). Gfs2 has potentials to do better (but I'm not sure). There are also few other things that worth mentioned but my flight is call for boarding .. I'll stop here ....... -- Wendy From gniagnia at gmail.com Tue Oct 7 14:24:38 2008 From: gniagnia at gmail.com (gnia gnia) Date: Tue, 7 Oct 2008 16:24:38 +0200 Subject: [Linux-cluster] how to configure qdisk in a two nodes cluster with mirrored LVM Message-ID: Hello all, Situation: We have a 2 nodes cluster (we don't use GFS). Only one node has an active service. The other node is only here in case the first node crashs (application automatically restarts on the healthy node). This service has a file system resource that is a mirrored LV across 2 storage bays (HSV210 - HP EVA8000 : let's call them SAN1 and SAN2). We also have a quorum disk that is declared on SAN1. Today, we tested the cluster behaviour in case of a SAN2 outage (We did so by deactivating zoning between nodes and SAN2 controllers). Immediatly, ths IO/s on the mirrored LV are stopped. After 2 or 3 minutes, the mirrored LV becomes linear and I/Os resume on the available storage bay : Sep 25 12:02:01 redhat lvm[15525]: Mirror device, 253:7, has failed. Sep 25 12:02:01 redhat lvm[15525]: Device failure in vghpdriver-lvhpdriver Sep 25 12:02:22 redhat lvm[15525]: WARNING: Bad device removed from mirror volume, vghpdriver/lvhpdriver Sep 25 12:02:22 redhat kernel: end_request: I/O error, dev sda, sector 5559920 Sep 25 12:02:22 redhat lvm[15525]: WARNING: Mirror volume, vghpdriver/lvhpdriver converted to linear due to device failure. This is pretty much what we hoped for. But, when we do the same test on SAN1 (the one with qdisk), the cluster instantly becomes inquorate and stops working. Here is our qdisk configuration : Here is the ouput of 'cman_tool status' # cman_tool status Protocol version: 5.0.1 Config version: 2 Cluster name: clu_HPDRIVER Cluster ID: 23324 Cluster Member: Yes Membership state: Cluster-Member Nodes: 2 Expected_votes: 2 Total_votes: 3 Quorum: 2 Active subsystems: 5 Node name: redhat.test.com Node ID: 2 Node addresses: 192.168.1.6 I tried to replace votes="2" by votes="1" in /etc/cluster/cluster.conf... which solved the problem. But is it safe to do this? Thanks From tuckerd at engr.smu.edu Tue Oct 7 15:00:17 2008 From: tuckerd at engr.smu.edu (Doug Tucker) Date: Tue, 07 Oct 2008 10:00:17 -0500 Subject: [Linux-cluster] rhcs + gfs performance issues In-Reply-To: <48EA9251.40606@bobich.net> References: <48BF35C50600251B@> (added by postmaster@mail.o2.co.uk) <1223053011.22918.62.camel@thor.seas.smu.edu> <48E6567D.5020508@bobich.net> <1223063224.22918.102.camel@thor.seas.smu.edu> <48E7A8CA.1000502@bobich.net> <1223306502.29679.12.camel@thor.seas.smu.edu> <48EA4E9C.7070901@bobich.net> <1223318063.29679.20.camel@thor.seas.smu.edu> <48EA62C2.8070002@bobich.net> <1223323513.29679.31.camel@thor.seas.smu.edu> <48EA7892.5030902@bobich.net> <1223329280.29679.50.camel@thor.seas.smu.edu> <48EA9251.40606@bobich.net> Message-ID: <1223391617.25152.40.camel@thor.seas.smu.edu> > > > > I can't see a way around some significant downtime even with that, and > > there is no way they will give me the option to be down from a planned > > perspective. > > So, out of nowhere straight into production, without performance user > acceptance testing period? And they won't allow any planned downtime? My > mind boggles. Yours too huh? This is the strangest place I have ever worked quite frankly. I've never been anywhere I could not set aside a 2 hour window at 3 am once a month for upgrades/maintenance. They don't allow me that here. Migrating from old file server to new one, was done with zero downtime and no interruption to the user community. Due to $$$, very little redundancy. Sure, downtime does happen, but only when something breaks. I could go on, I think you get the picture and whining about it doesn't help me here. Straight into production...well, not exactly. I set up a cluster, and moved one application over and it ran for about 3 months before we began the user moves. Once the users and mail were moved, that's when the load issues reared it's ugly head. Like I said, was really bad at first. Had to bump the nfs processes to 256..that helped some..setting the fs to fast = 1, had a much bigger impact. The odd thing is, it doesn't seem to take much to drive up the load. Being an engineering school, we have a lot of cadence users, and cadence writes 2-5k files on a big job, and it doesn't take more than 2 or 3 users doing this, along with the normal stuff always touching the fileserver (such as mail, web, etc) to drive up load. I can virus scan my mapped home directory and watch load jump by 2 or 3. Mounting my old home directory on the old file server and doing the same thing, you wouldn't even know I was touching files out there. It's like directory/file access is just very expensive for some reason and it goes against everything I know :P. Let me run this by you. I thought about another potential upgrade path. What if I remove one node from the cluster and run on one node, take the 2nd down, install 5, get it prepped. Is there anyway in the world to somehow bring it up and have it mount the volumes AS master and take the current primary down to then rebuild it? I think my answer is no, but thought it worth asking. This inability to cross version participate seems to really be my achilles heal here in getting it upgraded. > > Good luck. > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From shawnlhood at gmail.com Tue Oct 7 17:33:45 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Tue, 7 Oct 2008 13:33:45 -0400 Subject: [Linux-cluster] GFS hanging on 3 node RHEL4 cluster Message-ID: Problem: It seems that IO on one machine in the cluster (not always the same machine) will hang and all processes accessing clustered LVs will block. Other machines will follow suit shortly thereafter until the machine that first exhibited the problem is rebooted (via fence_drac manually). No messages in dmesg, syslog, etc. Filesystems recently fsckd. Hardware: Dell 1950s (similar except memory -- 3x 16GB RAM, 1x 8GB RAM). Running RHEL4 ES U7. Four machines Onboard gigabit NICs (Machines use little bandwidth, and all network traffic including DLM share NICs) QLogic 2462 PCI-Express dual channel FC HBAs QLogic SANBox 5200 FC switch Apple XRAID which presents as two LUNs (~4.5TB raw aggregate) Cisco Catalyst switch Simple four machine RHEL4 U7 cluster running kernel 2.6.9-78.0.1.ELsmp x86_64 with the following packages: ccs-1.0.12-1 cman-1.0.24-1 cman-kernel-smp-2.6.9-55.13.el4_7.1 cman-kernheaders-2.6.9-55.13.el4_7.1 dlm-kernel-smp-2.6.9-54.11.el4_7.1 dlm-kernheaders-2.6.9-54.11.el4_7.1 fence-1.32.63-1.el4_7.1 GFS-6.1.18-1 GFS-kernel-smp-2.6.9-80.9.el4_7.1 One clustered VG. Striped across two physical volumes, which correspond to each side of an Apple XRAID. Clustered volume group info: --- Volume group --- VG Name hq-san System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 50 VG Access read/write VG Status resizable Clustered yes Shared no MAX LV 0 Cur LV 3 Open LV 3 Max PV 0 Cur PV 2 Act PV 2 VG Size 4.55 TB PE Size 4.00 MB Total PE 1192334 Alloc PE / Size 905216 / 3.45 TB Free PE / Size 287118 / 1.10 TB VG UUID hfeIhf-fzEq-clCf-b26M-cMy3-pphm-B6wmLv Logical volumes contained with hq-san VG: cam_development hq-san -wi-ao 500.00G qa hq-san -wi-ao 1.07T svn_users hq-san -wi-ao 1.89T All four machines mount svn_users, two machines mount qa, and one mounts cam_development. /etc/cluster/cluster.conf: -- Shawn Hood 910.670.1819 m From shawnlhood at gmail.com Tue Oct 7 17:40:51 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Tue, 7 Oct 2008 13:40:51 -0400 Subject: [Linux-cluster] Re: GFS hanging on 3 node RHEL4 cluster In-Reply-To: References: Message-ID: More info: All filesystems mounted using noatime,nodiratime,noquota. All filesystems report the same data from gfs_tool gettune: limit1 = 100 ilimit1_tries = 3 ilimit1_min = 1 ilimit2 = 500 ilimit2_tries = 10 ilimit2_min = 3 demote_secs = 300 incore_log_blocks = 1024 jindex_refresh_secs = 60 depend_secs = 60 scand_secs = 5 recoverd_secs = 60 logd_secs = 1 quotad_secs = 5 inoded_secs = 15 glock_purge = 0 quota_simul_sync = 64 quota_warn_period = 10 atime_quantum = 3600 quota_quantum = 60 quota_scale = 1.0000 (1, 1) quota_enforce = 0 quota_account = 0 new_files_jdata = 0 new_files_directio = 0 max_atomic_write = 4194304 max_readahead = 262144 lockdump_size = 131072 stall_secs = 600 complain_secs = 10 reclaim_limit = 5000 entries_per_readdir = 32 prefetch_secs = 10 statfs_slots = 64 max_mhc = 10000 greedy_default = 100 greedy_quantum = 25 greedy_max = 250 rgrp_try_threshold = 100 statfs_fast = 0 seq_readahead = 0 And data on the FS from gfs_tool counters: locks 2948 locks held 1352 freeze count 0 incore inodes 1347 metadata buffers 0 unlinked inodes 0 quota IDs 0 incore log buffers 0 log space used 0.05% meta header cache entries 0 glock dependencies 0 glocks on reclaim list 0 log wraps 2 outstanding LM calls 0 outstanding BIO calls 0 fh2dentry misses 0 glocks reclaimed 223287 glock nq calls 1812286 glock dq calls 1810926 glock prefetch calls 101158 lm_lock calls 198294 lm_unlock calls 142643 lm callbacks 341621 address operations 502691 dentry operations 395330 export operations 0 file operations 199243 inode operations 984276 super operations 1727082 vm operations 0 block I/O reads 520531 block I/O writes 130315 locks 171423 locks held 85717 freeze count 0 incore inodes 85376 metadata buffers 1474 unlinked inodes 0 quota IDs 0 incore log buffers 24 log space used 0.83% meta header cache entries 6621 glock dependencies 2037 glocks on reclaim list 0 log wraps 428 outstanding LM calls 0 outstanding BIO calls 0 fh2dentry misses 0 glocks reclaimed 45784677 glock nq calls 962822941 glock dq calls 962595532 glock prefetch calls 20215922 lm_lock calls 40708633 lm_unlock calls 23410498 lm callbacks 64156052 address operations 705464659 dentry operations 19701522 export operations 0 file operations 364990733 inode operations 98910127 super operations 440061034 vm operations 7 block I/O reads 90394984 block I/O writes 131199864 locks 2916542 locks held 1476005 freeze count 0 incore inodes 1454165 metadata buffers 12539 unlinked inodes 100 quota IDs 0 incore log buffers 11 log space used 13.33% meta header cache entries 9928 glock dependencies 110 glocks on reclaim list 0 log wraps 2393 outstanding LM calls 25 outstanding BIO calls 0 fh2dentry misses 55546 glocks reclaimed 127341056 glock nq calls 867427 glock dq calls 867430 glock prefetch calls 36679316 lm_lock calls 110179878 lm_unlock calls 84588424 lm callbacks 194863553 address operations 250891447 dentry operations 359537343 export operations 390941288 file operations 399156716 inode operations 537830 super operations 1093798409 vm operations 774785 block I/O reads 258044208 block I/O writes 101585172 On Tue, Oct 7, 2008 at 1:33 PM, Shawn Hood wrote: > Problem: > It seems that IO on one machine in the cluster (not always the same > machine) will hang and all processes accessing clustered LVs will > block. Other machines will follow suit shortly thereafter until the > machine that first exhibited the problem is rebooted (via fence_drac > manually). No messages in dmesg, syslog, etc. Filesystems recently > fsckd. > > Hardware: > Dell 1950s (similar except memory -- 3x 16GB RAM, 1x 8GB RAM). > Running RHEL4 ES U7. Four machines > Onboard gigabit NICs (Machines use little bandwidth, and all network > traffic including DLM share NICs) > QLogic 2462 PCI-Express dual channel FC HBAs > QLogic SANBox 5200 FC switch > Apple XRAID which presents as two LUNs (~4.5TB raw aggregate) > Cisco Catalyst switch > > Simple four machine RHEL4 U7 cluster running kernel 2.6.9-78.0.1.ELsmp > x86_64 with the following packages: > ccs-1.0.12-1 > cman-1.0.24-1 > cman-kernel-smp-2.6.9-55.13.el4_7.1 > cman-kernheaders-2.6.9-55.13.el4_7.1 > dlm-kernel-smp-2.6.9-54.11.el4_7.1 > dlm-kernheaders-2.6.9-54.11.el4_7.1 > fence-1.32.63-1.el4_7.1 > GFS-6.1.18-1 > GFS-kernel-smp-2.6.9-80.9.el4_7.1 > > One clustered VG. Striped across two physical volumes, which > correspond to each side of an Apple XRAID. > Clustered volume group info: > --- Volume group --- > VG Name hq-san > System ID > Format lvm2 > Metadata Areas 2 > Metadata Sequence No 50 > VG Access read/write > VG Status resizable > Clustered yes > Shared no > MAX LV 0 > Cur LV 3 > Open LV 3 > Max PV 0 > Cur PV 2 > Act PV 2 > VG Size 4.55 TB > PE Size 4.00 MB > Total PE 1192334 > Alloc PE / Size 905216 / 3.45 TB > Free PE / Size 287118 / 1.10 TB > VG UUID hfeIhf-fzEq-clCf-b26M-cMy3-pphm-B6wmLv > > Logical volumes contained with hq-san VG: > cam_development hq-san -wi-ao 500.00G > qa hq-san -wi-ao 1.07T > svn_users hq-san -wi-ao 1.89T > > All four machines mount svn_users, two machines mount qa, and one > mounts cam_development. > > /etc/cluster/cluster.conf: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ipaddr="redacted" login="root" passwd="redacted"/> > ipaddr="redacted" login="root" passwd="redacted"/> > ipaddr="redacted" login="root" passwd="redacted"/> > ipaddr="redacted" login="root" passwd="redacted"/> > > > > > > > > > > > -- > Shawn Hood > 910.670.1819 m > -- Shawn Hood 910.670.1819 m From shawnlhood at gmail.com Tue Oct 7 17:43:07 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Tue, 7 Oct 2008 13:43:07 -0400 Subject: [Linux-cluster] Re: GFS hanging on 3 node RHEL4 cluster In-Reply-To: References: Message-ID: And for another follow-up in the interest of full disclosure, I don't recall the specifics, but it seems dlm_recvd was eating up all the CPU cycles on one of the machines, and others seemed to follow suit shortly thereafter. Sorry for the flood! Shawn -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.osullivan at auckland.ac.nz Tue Oct 7 19:10:13 2008 From: michael.osullivan at auckland.ac.nz (michael.osullivan at auckland.ac.nz) Date: Wed, 8 Oct 2008 08:10:13 +1300 (NZDT) Subject: [Linux-cluster] RE: Can't create LV in 2-node cluster Message-ID: <1393.128.187.171.228.1223406613.squirrel@mail.esc.auckland.ac.nz> Hi Mark, This is just an experimental cluster for now, not production, so 2-nodes is sufficient (as long as it doesn't significantly alter the setup, which I don;t think it does). I have two multi-pathed iSCSI targets for storage, one each on two separate boxes. I have got this going previously on a slightly different set-up elsewhere, but this is my next effort that incorporates easy shutdown/startup of the storage and cluster. Except I can't get the LV up and running... Thanks, Mike Date: Mon, 6 Oct 2008 11:03:57 -0500 From: "Mark Chaney" Subject: RE: [Linux-cluster] Can't create LV in 2-node cluster To: "'linux clustering'" Message-ID: <04cb01c927cd$24b23ed0$6e16bc70$@com> Content-Type: text/plain; charset="us-ascii" What are you using for shared storage? Also, 2 node clusters are highly discouraged from my experience and recommendations of others. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of michael.osullivan at auckland.ac.nz Sent: Monday, October 06, 2008 10:28 AM To: linux-cluster at redhat.com Subject: [Linux-cluster] Can't create LV in 2-node cluster Hi everyone, I have created a 2-node cluster and (after a little difficulty) created a clustered volume group visible on both nodes. However, I can't create a logical volume on either node. I get the following error: Error locking on node : Error backing up metadata, can't find VG for group #global Aborting. Failed to activate new LV to wipe the start of it. I have locking_type = 3 and locking_library = liblvm2clusterlock.so. The metadata areas and sequence nos are the same using vgdisplay on both nodes. Can anyone help me create LVs? I am happy to provide any extra info needed. Thanks, Mike From macscr at macscr.com Tue Oct 7 19:24:43 2008 From: macscr at macscr.com (Mark Chaney) Date: Tue, 7 Oct 2008 14:24:43 -0500 Subject: [Linux-cluster] RE: Can't create LV in 2-node cluster In-Reply-To: <1393.128.187.171.228.1223406613.squirrel@mail.esc.auckland.ac.nz> References: <1393.128.187.171.228.1223406613.squirrel@mail.esc.auckland.ac.nz> Message-ID: <00b901c928b2$5b1b3810$1151a830$@com> So is clvmd running fine on both nodes? If its not, your not going to be able to do anything with the shared storage. After you have verified its running, do a vgscan. If you get any errors, you have to fix those first before you can move ahead to worrying about the lv issues. I am far from a cluster expert. Im actually having my own issues right now, but I am just passing on some info that I have learned along the way. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of michael.osullivan at auckland.ac.nz Sent: Tuesday, October 07, 2008 2:10 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] RE: Can't create LV in 2-node cluster Hi Mark, This is just an experimental cluster for now, not production, so 2-nodes is sufficient (as long as it doesn't significantly alter the setup, which I don;t think it does). I have two multi-pathed iSCSI targets for storage, one each on two separate boxes. I have got this going previously on a slightly different set-up elsewhere, but this is my next effort that incorporates easy shutdown/startup of the storage and cluster. Except I can't get the LV up and running... Thanks, Mike Date: Mon, 6 Oct 2008 11:03:57 -0500 From: "Mark Chaney" Subject: RE: [Linux-cluster] Can't create LV in 2-node cluster To: "'linux clustering'" Message-ID: <04cb01c927cd$24b23ed0$6e16bc70$@com> Content-Type: text/plain; charset="us-ascii" What are you using for shared storage? Also, 2 node clusters are highly discouraged from my experience and recommendations of others. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of michael.osullivan at auckland.ac.nz Sent: Monday, October 06, 2008 10:28 AM To: linux-cluster at redhat.com Subject: [Linux-cluster] Can't create LV in 2-node cluster Hi everyone, I have created a 2-node cluster and (after a little difficulty) created a clustered volume group visible on both nodes. However, I can't create a logical volume on either node. I get the following error: Error locking on node : Error backing up metadata, can't find VG for group #global Aborting. Failed to activate new LV to wipe the start of it. I have locking_type = 3 and locking_library = liblvm2clusterlock.so. The metadata areas and sequence nos are the same using vgdisplay on both nodes. Can anyone help me create LVs? I am happy to provide any extra info needed. Thanks, Mike -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From caronc at navcanada.ca Tue Oct 7 20:57:24 2008 From: caronc at navcanada.ca (Caron, Chris) Date: Tue, 7 Oct 2008 16:57:24 -0400 Subject: [Linux-cluster] GFS & journal resources ... common ratio? Message-ID: <474534909BE4064E853161350C47578E0C15999E@ncrmail1.corp.navcan.ca> I just have a quick question regarding the amount of storage occupied by the journals. Is there a common ratio to determine how much space will be occupied? I'm doing very rough map and experimenting with different conditions, but I can't seem to get a common mechanism for predicting the usable storage limit given the number of locks. Currently we have an 8 node cluster with 9 journals defined (+1 just in case)... We are awaiting the external storage hardware; but in the time being I am using exporting iscsi drives from another machine to work with. With 9 locks and creating a 1.75GB partition, I get 655 MB usable from it.... (roughly calculated ratio: 3.37) With 9 locks and creating a 1.50GB partition, I get 393 MB usable from it.... (roughly calculated ratio: 2.36) With 9 locks and creating a 1.25GB partition, I get 131 MB usable from it.... (roughly calculated ratio: 0.90) I'm obviously missing a figure during my calculations because the ratios vary each time... I'd have expected them to be a bit more constant... At the end of the day I want to be able to know ahead of time how much hard-disk space I need to allocate for 'X usable' based on the number of locks I'm using. The simple equation I used was: X/9=USABLE/ALLOCATED ie: (X/9=.655/1.75 = 3.369) Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From kanderso at redhat.com Tue Oct 7 21:11:37 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Tue, 07 Oct 2008 16:11:37 -0500 Subject: [Linux-cluster] GFS & journal resources ... common ratio? In-Reply-To: <474534909BE4064E853161350C47578E0C15999E@ncrmail1.corp.navcan.ca> References: <474534909BE4064E853161350C47578E0C15999E@ncrmail1.corp.navcan.ca> Message-ID: <1223413897.4420.54.camel@dhcp80-204.msp.redhat.com> On Tue, 2008-10-07 at 16:57 -0400, Caron, Chris wrote: > I just have a quick question regarding the amount of storage occupied > by the journals. Is there a common ratio to determine how much space > will be occupied? > Default Journal size is 128MB. > I?m doing very rough map and experimenting with different conditions, > but I can?t seem to get a common mechanism for predicting the usable > storage limit given the number of locks. > > > > Currently we have an 8 node cluster with 9 journals defined (+1 just > in case)? We are awaiting the external storage hardware; but in the > time being I am using exporting iscsi drives from another machine to > work with. 9 * 128MB = 1152MB - which is pretty consistent with your remaining space below. > > > > With 9 locks and creating a 1.75GB partition, I get 655 MB usable from > it?. (roughly calculated ratio: 3.37) > > With 9 locks and creating a 1.50GB partition, I get 393 MB usable from > it?. (roughly calculated ratio: 2.36) > > With 9 locks and creating a 1.25GB partition, I get 131 MB usable from > it?. (roughly calculated ratio: 0.90) > > > > I?m obviously missing a figure during my calculations because the > ratios vary each time? I?d have expected them to be a bit more > constant? > > At the end of the day I want to be able to know ahead of time how much > hard-disk space I need to allocate for ?X usable? based on the number > of locks I?m using. > > > > The simple equation I used was: > > X/9=USABLE/ALLOCATED > > > > ie: (X/9=.655/1.75 = 3.369) > > > > > > Chris > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From stelgn at gmail.com Wed Oct 8 09:15:27 2008 From: stelgn at gmail.com (Ernest Neo Wee Teck) Date: Wed, 8 Oct 2008 17:15:27 +0800 Subject: [Linux-cluster] GNBD multi import? Message-ID: <93af20e00810080215j6c8a625fwfef560f314468a52@mail.gmail.com> I have 3 servers deployed with CLVM, GNBD, CMAN. fencing using fence_gnbd ServerA gnbd_export a logical block (LVM) named "r1" Is it possible for ServerB and ServerC to import "r1" at the same time? Can domU running on either ServerB and C use the imported r1 (/dev/gnbd/r1)? eg. phy:/dev/gnbd/r1 Is live migration feasible in this case if r1 could be imported on ServerB and C? Trying to test the concept of live migration with shared storage here Cheers, Ernest From michael.osullivan at auckland.ac.nz Wed Oct 8 16:00:16 2008 From: michael.osullivan at auckland.ac.nz (michael.osullivan at auckland.ac.nz) Date: Thu, 9 Oct 2008 05:00:16 +1300 (NZDT) Subject: [Linux-cluster] RE: Can't create LV in 2-node cluster Message-ID: <1482.128.187.169.21.1223481616.squirrel@mail.esc.auckland.ac.nz> Hi Mark, clvmd is running fine on both nodes. The result of "service clvmd status" is clvmd (pid xxxxx) is running... active volumes: LogVol00 LogVol01 The result of vgscan is Reading all physical volumes. This may take a while... Found volume group "iscsi_raid_vg" using metadata type lvm2 Found volume group "VolGroup00" using metadata type lvm2 I just can't create a logical volume either from the command line or using system-config-lvm... Thanks, Mike * From: "Mark Chaney" * To: "'linux clustering'" * Subject: RE: [Linux-cluster] RE: Can't create LV in 2-node cluster * Date: Tue, 7 Oct 2008 14:24:43 -0500 So is clvmd running fine on both nodes? If its not, your not going to be able to do anything with the shared storage. After you have verified its running, do a vgscan. If you get any errors, you have to fix those first before you can move ahead to worrying about the lv issues. I am far from a cluster expert. Im actually having my own issues right now, but I am just passing on some info that I have learned along the way. -----Original Message----- From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of michael osullivan auckland ac nz Sent: Tuesday, October 07, 2008 2:10 PM To: linux-cluster redhat com Subject: [Linux-cluster] RE: Can't create LV in 2-node cluster Hi Mark, This is just an experimental cluster for now, not production, so 2-nodes is sufficient (as long as it doesn't significantly alter the setup, which I don;t think it does). I have two multi-pathed iSCSI targets for storage, one each on two separate boxes. I have got this going previously on a slightly different set-up elsewhere, but this is my next effort that incorporates easy shutdown/startup of the storage and cluster. Except I can't get the LV up and running... Thanks, Mike Date: Mon, 6 Oct 2008 11:03:57 -0500 From: "Mark Chaney" Subject: RE: [Linux-cluster] Can't create LV in 2-node cluster To: "'linux clustering'" Message-ID: <04cb01c927cd$24b23ed0$6e16bc70$ com> Content-Type: text/plain; charset="us-ascii" What are you using for shared storage? Also, 2 node clusters are highly discouraged from my experience and recommendations of others. -----Original Message----- From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of michael osullivan auckland ac nz Sent: Monday, October 06, 2008 10:28 AM To: linux-cluster redhat com Subject: [Linux-cluster] Can't create LV in 2-node cluster Hi everyone, I have created a 2-node cluster and (after a little difficulty) created a clustered volume group visible on both nodes. However, I can't create a logical volume on either node. I get the following error: Error locking on node : Error backing up metadata, can't find VG for group #global Aborting. Failed to activate new LV to wipe the start of it. I have locking_type = 3 and locking_library = liblvm2clusterlock.so. The metadata areas and sequence nos are the same using vgdisplay on both nodes. Can anyone help me create LVs? I am happy to provide any extra info needed. Thanks, Mike -- Linux-cluster mailing list Linux-cluster redhat com https://www.redhat.com/mailman/listinfo/linux-cluster From Dave.Jones at maritz.com Wed Oct 8 16:04:04 2008 From: Dave.Jones at maritz.com (Jones, Dave) Date: Wed, 8 Oct 2008 11:04:04 -0500 Subject: [Linux-cluster] 2nd ILO idea In-Reply-To: <1482.128.187.169.21.1223481616.squirrel@mail.esc.auckland.ac.nz> References: <1482.128.187.169.21.1223481616.squirrel@mail.esc.auckland.ac.nz> Message-ID: Hello all. Has anyone experimented with adding a second ILO card into HP servers, reserving 1 for normal ILO access and the second for fencing? Just curious. I'm not even sure if they sell add-on ILO boards anymore. Or if 2 of them would work in the same server. Thanks, Dave Confidentiality Warning: This e-mail contains information intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, any dissemination, publication or copying of this e-mail is strictly prohibited. The sender does not accept any responsibility for any loss, disruption or damage to your data or computer system that may occur while using data contained in, or transmitted with, this e-mail. If you have received this e-mail in error, please immediately notify us by return e-mail. Thank you. From jruemker at redhat.com Wed Oct 8 18:33:45 2008 From: jruemker at redhat.com (John Ruemker) Date: Wed, 08 Oct 2008 14:33:45 -0400 Subject: [Linux-cluster] RE: Can't create LV in 2-node cluster In-Reply-To: <1482.128.187.169.21.1223481616.squirrel@mail.esc.auckland.ac.nz> References: <1482.128.187.169.21.1223481616.squirrel@mail.esc.auckland.ac.nz> Message-ID: <48ECFD09.6010801@redhat.com> michael.osullivan at auckland.ac.nz wrote: > Hi Mark, > > clvmd is running fine on both nodes. The result of "service clvmd status" is > > clvmd (pid xxxxx) is running... > active volumes: LogVol00 LogVol01 > > The result of vgscan is > > Reading all physical volumes. This may take a while... > Found volume group "iscsi_raid_vg" using metadata type lvm2 > Found volume group "VolGroup00" using metadata type lvm2 > > I just can't create a logical volume either from the command line or using > system-config-lvm... Did you partition the device before adding a physical volume to it? If so, did you run partprobe on both nodes? A common scenario is to partition the device from node 1 and create a physical volume on it. However the partition table is not automatically read on the second node so it has no idea there is a partition there. When clvmd tells the second node to activate a vg or lv on this unknown device, that node responds that it can't lock on to the device since it has no idea what it is. If you do end up in this situation then usually the solution is to do this on both nodes # rm /etc/lvm/cache/.cache # partprobe # clvmd -R Then from one node: # pvscan # vgscan # lvchange -ay vg/lv Try this and see if it helps. -John From jmartin at learningobjects.com Wed Oct 8 19:53:55 2008 From: jmartin at learningobjects.com (James Martin) Date: Wed, 08 Oct 2008 15:53:55 -0400 Subject: [Linux-cluster] 2nd ILO idea In-Reply-To: References: <1482.128.187.169.21.1223481616.squirrel@mail.esc.auckland.ac.nz> Message-ID: <48ED0FD3.2000705@learningobjects.com> I don't believe the sell them as add-ons except for older servers that didn't come with them integrated. Why not just by a APC PDU or something similar that lets you power on/off specific outlets? James Jones, Dave wrote: > Hello all. > > Has anyone experimented with adding a second ILO card into HP servers, > reserving 1 for normal ILO access and the second for fencing? > > Just curious. I'm not even sure if they sell add-on ILO boards > anymore. Or if 2 of them would work in the same server. > > Thanks, > Dave > > Confidentiality Warning: This e-mail contains information intended only for the use of the individual or entity named above. If the reader of this e-mail is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, any dissemination, publication or copying of this e-mail is strictly prohibited. The sender does not accept any responsibility for any loss, disruption or damage to your data or computer system that may occur while using data contained in, or transmitted with, this e-mail. > If you have received this e-mail in error, please immediately notify us by return e-mail. Thank you. > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jamesc at exa.com Wed Oct 8 20:56:35 2008 From: jamesc at exa.com (James Chamberlain) Date: Wed, 8 Oct 2008 16:56:35 -0400 Subject: [Linux-cluster] gfs_grow In-Reply-To: <81D8B57D-B9C8-4AA0-8BEC-F45212795FB6@exa.com> References: <81D8B57D-B9C8-4AA0-8BEC-F45212795FB6@exa.com> Message-ID: Hi all, I'd like to thank Bob Peterson for helping me solve the last problem I was seeing with my storage cluster. I've got a new one now. A couple days ago, site ops plugged in a new storage shelf and this triggered some sort of error in the storage chassis. I was able to sort that out with gfs_fsck, and have since gotten the new storage recognized by the cluster. I'd like to make use of this new storage, and it's here that we run into trouble. lvextend completed with no trouble, so I ran gfs_grow. gfs_grow has been running for over an hour now and has not progressed past: [root at s12n01 ~]# gfs_grow /dev/s12/scratch13 FS: Mount Point: /scratch13 FS: Device: /dev/s12/scratch13 FS: Options: rw,noatime,nodiratime FS: Size: 4392290302 DEV: Size: 5466032128 Preparing to write new FS information... The load average on this node has risen from its normal ~30-40 to 513 (the number of nfsd threads, plus one), and the file system has become slow-to-inaccessible on client nodes. I am seeing messages in my log files that indicate things like: Oct 8 16:26:00 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Oct 8 16:26:00 s12n01 last message repeated 4 times Oct 8 16:26:00 s12n01 kernel: nfsd: peername failed (err 107)! Oct 8 16:26:58 s12n01 kernel: nfsd: peername failed (err 107)! Oct 8 16:27:56 s12n01 last message repeated 2 times Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Oct 8 16:27:56 s12n01 kernel: nfsd: peername failed (err 107)! Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket Oct 8 16:27:56 s12n01 kernel: nfsd: peername failed (err 107)! Oct 8 16:28:34 s12n01 last message repeated 2 times Oct 8 16:30:29 s12n01 last message repeated 2 times I was seeing similar messages this morning, but those went away when I mounted this file system on another node in the cluster, turned on statfs_fast, and then moved the service to that node. I'm not sure what to do about it given that gfs_grow is running. Is this something anyone else has seen? Does anyone know what to do about this? Do I have any option other than to wait until gfs_grow is done? Given my recent experiences (see "lm_dlm_cancel" in the list archives), I'm very hesitant to hit ^C on this gfs_grow. I'm running CentOS 4 for x86-64, kernel 2.6.9-67.0.20.ELsmp. Thanks, James From andrew at ntsg.umt.edu Wed Oct 8 21:12:39 2008 From: andrew at ntsg.umt.edu (Andrew A. Neuschwander) Date: Wed, 08 Oct 2008 15:12:39 -0600 Subject: [Linux-cluster] gfs_grow In-Reply-To: References: <81D8B57D-B9C8-4AA0-8BEC-F45212795FB6@exa.com> Message-ID: <48ED2247.8050406@ntsg.umt.edu> James, I have a CentOS 5.2 cluster where I would see the same nfs errors under certain conditions. If I did anything that introduced latency to my gfs operations on the node that served nfs, the nfs threads couldn't service requests faster than they came in from clients. Eventually my nfs threads would all be busy and start dropping nfs requests. I kept an eye on my nfsd thread utilization (/proc/net/rpc/nfsd) and kept bumping up the number of threads until they could handle all the requests while the gfs had a higher latency. In my case, I had EMC Networker streaming data from my gfs filesystems to a local scsi tape device on the same node that served nfs. I eventually separated them onto different nodes. I'm sure gfs_grow would slow down your gfs enough that your nfs threads couldn't keep up. NFS on gfs seems to be very latency sensitive. I have a quick an dirty perl script to generate a historgram image from nfs thread stats if you are interested. -Andrew -- Andrew A. Neuschwander, RHCE Linux Systems/Software Engineer College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew at ntsg.umt.edu - 406.243.6310 James Chamberlain wrote: > Hi all, > > I'd like to thank Bob Peterson for helping me solve the last problem I > was seeing with my storage cluster. I've got a new one now. A couple > days ago, site ops plugged in a new storage shelf and this triggered > some sort of error in the storage chassis. I was able to sort that out > with gfs_fsck, and have since gotten the new storage recognized by the > cluster. I'd like to make use of this new storage, and it's here that > we run into trouble. > > lvextend completed with no trouble, so I ran gfs_grow. gfs_grow has > been running for over an hour now and has not progressed past: > > [root at s12n01 ~]# gfs_grow /dev/s12/scratch13 > FS: Mount Point: /scratch13 > FS: Device: /dev/s12/scratch13 > FS: Options: rw,noatime,nodiratime > FS: Size: 4392290302 > DEV: Size: 5466032128 > Preparing to write new FS information... > > The load average on this node has risen from its normal ~30-40 to 513 > (the number of nfsd threads, plus one), and the file system has become > slow-to-inaccessible on client nodes. I am seeing messages in my log > files that indicate things like: > > Oct 8 16:26:00 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Oct 8 16:26:00 s12n01 last message repeated 4 times > Oct 8 16:26:00 s12n01 kernel: nfsd: peername failed (err 107)! > Oct 8 16:26:58 s12n01 kernel: nfsd: peername failed (err 107)! > Oct 8 16:27:56 s12n01 last message repeated 2 times > Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Oct 8 16:27:56 s12n01 kernel: nfsd: peername failed (err 107)! > Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 when > sending 140 bytes - shutting down socket > Oct 8 16:27:56 s12n01 kernel: nfsd: peername failed (err 107)! > Oct 8 16:28:34 s12n01 last message repeated 2 times > Oct 8 16:30:29 s12n01 last message repeated 2 times > > I was seeing similar messages this morning, but those went away when I > mounted this file system on another node in the cluster, turned on > statfs_fast, and then moved the service to that node. I'm not sure what > to do about it given that gfs_grow is running. Is this something anyone > else has seen? Does anyone know what to do about this? Do I have any > option other than to wait until gfs_grow is done? Given my recent > experiences (see "lm_dlm_cancel" in the list archives), I'm very > hesitant to hit ^C on this gfs_grow. I'm running CentOS 4 for x86-64, > kernel 2.6.9-67.0.20.ELsmp. > > Thanks, > > James > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From janar.kartau at gmail.com Wed Oct 8 23:24:51 2008 From: janar.kartau at gmail.com (Janar Kartau) Date: Thu, 09 Oct 2008 02:24:51 +0300 Subject: [Linux-cluster] GFS lockups ? Message-ID: <48ED4143.4070107@gmail.com> Hi, Recently our three-node webserver cluster started randomly crashing. I never had time to investigate what the problem was, cause i needed to bring them back online again. But it seemed like alla Apache processes just hang (couldn't even kill them).. waiting for something. The only thing that helped, was a reboot for all or couple of the nodes. Anyway, today i encountered this problem at night and i could look into it a little more. I noticed that some of the GFS filesystems were unaccessable (we have 5 of them, mounted on every nide) and of the nodes was completely unaccessable. So i guessed that this half-dead node was holding locks on the filesystems or sth. Did a hard reset on this dead node and all stabilized. Absolutely no cluster/GFS errors in the logs (besides the ones which tell that the half-dead node was leaving the cluster when i reset it). Nodes have CentOS 4.6 installed (2.6.9-67.0.7.ELsmp, dlm-1.0.7-1, GFS-6.1.15-1, cman-1.0.17-0.el4_6.5). We use EMC CX3-10c for GFS storage (over iSCSI) and EMC PowerPath for multipathing. Separate VLAN is used for CMAN/DLM traffic. Please give me ideas how to solve this or atleast some debugging tips as it's happening twice a day now and seems i simply can't help it. :( Janar Kartau From grimme at atix.de Thu Oct 9 06:40:58 2008 From: grimme at atix.de (Marc Grimme) Date: Thu, 9 Oct 2008 08:40:58 +0200 Subject: [Linux-cluster] GFS lockups ? In-Reply-To: <48ED4143.4070107@gmail.com> References: <48ED4143.4070107@gmail.com> Message-ID: <200810090840.58589.grimme@atix.de> On Thursday 09 October 2008 01:24:51 Janar Kartau wrote: > Hi, > Recently our three-node webserver cluster started randomly crashing. I > never had time to investigate what the problem was, cause i needed to > bring them back online again. But it seemed like alla Apache processes > just hang (couldn't even kill them).. waiting for something. The only > thing that helped, was a reboot for all or couple of the nodes. Anyway, > today i encountered this problem at night and i could look into it a > little more. I noticed that some of the GFS filesystems were > unaccessable (we have 5 of them, mounted on every nide) and of the nodes > was completely unaccessable. So i guessed that this half-dead node was > holding locks on the filesystems or sth. Did a hard reset on this dead > node and all stabilized. > Absolutely no cluster/GFS errors in the logs (besides the ones which > tell that the half-dead node was leaving the cluster when i reset it). > Nodes have CentOS 4.6 installed (2.6.9-67.0.7.ELsmp, dlm-1.0.7-1, > GFS-6.1.15-1, cman-1.0.17-0.el4_6.5). We use EMC CX3-10c for GFS storage > (over iSCSI) and EMC PowerPath for multipathing. Separate VLAN is used > for CMAN/DLM traffic. > Please give me ideas how to solve this or atleast some debugging tips as > it's happening twice a day now and seems i simply can't help it. :( Could you provide more information like relevant syslogs and console messages? Are you using php with sessions? -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ From shawnlhood at gmail.com Thu Oct 9 05:00:00 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Thu, 9 Oct 2008 01:00:00 -0400 Subject: [Linux-cluster] GFS lockups ? In-Reply-To: <48ED4143.4070107@gmail.com> References: <48ED4143.4070107@gmail.com> Message-ID: <0C318874-8DF9-4E40-BEC9-3CD90F8C85EA@gmail.com> See my thread from yesterday. Same general thing, but the dlm kernel threads were eating cycles. Sent from my iPhone On Oct 8, 2008, at 7:24 PM, Janar Kartau wrote: > Hi, > Recently our three-node webserver cluster started randomly crashing. I > never had time to investigate what the problem was, cause i needed to > bring them back online again. But it seemed like alla Apache processes > just hang (couldn't even kill them).. waiting for something. The only > thing that helped, was a reboot for all or couple of the nodes. > Anyway, > today i encountered this problem at night and i could look into it a > little more. I noticed that some of the GFS filesystems were > unaccessable (we have 5 of them, mounted on every nide) and of the > nodes > was completely unaccessable. So i guessed that this half-dead node was > holding locks on the filesystems or sth. Did a hard reset on this dead > node and all stabilized. > Absolutely no cluster/GFS errors in the logs (besides the ones which > tell that the half-dead node was leaving the cluster when i reset it). > Nodes have CentOS 4.6 installed (2.6.9-67.0.7.ELsmp, dlm-1.0.7-1, > GFS-6.1.15-1, cman-1.0.17-0.el4_6.5). We use EMC CX3-10c for GFS > storage > (over iSCSI) and EMC PowerPath for multipathing. Separate VLAN is used > for CMAN/DLM traffic. > Please give me ideas how to solve this or atleast some debugging > tips as > it's happening twice a day now and seems i simply can't help it. :( > > Janar Kartau > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From grimme at atix.de Thu Oct 9 07:10:27 2008 From: grimme at atix.de (Marc Grimme) Date: Thu, 9 Oct 2008 09:10:27 +0200 Subject: [Linux-cluster] Re: GFS hanging on 3 node RHEL4 cluster In-Reply-To: References: Message-ID: <200810090910.27894.grimme@atix.de> On Tuesday 07 October 2008 19:43:07 Shawn Hood wrote: > And for another follow-up in the interest of full disclosure, I don't > recall the specifics, but it seems dlm_recvd was eating up all the CPU > cycles on one of the machines, and others seemed to follow suit shortly > thereafter. Sorry for the flood! > > Shawn You might want to enable glock_purging. This should reduce or even eliminate the problems (not sure yet what it is dependent on). But normally to enable glock_purging (in my experiance) reduces the likelyhood of gfs/dlm freezes. You are sure you don't have any syslog or console message related to the cluster before? -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ From federico.simoncelli at gmail.com Thu Oct 9 11:20:45 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Thu, 9 Oct 2008 13:20:45 +0200 Subject: [Linux-cluster] Cluster monitoring Message-ID: Hi all, what is the best way to generate mail notification for cluster events such as joins/leaves/fences? I would rather not use an external monitor system like nagios and ganglia but looks like those are the best practice for now. Is there any other monitoring application/technique that I should consider? Thanks in advance, -- Federico. From hrouamba at gmail.com Thu Oct 9 11:52:45 2008 From: hrouamba at gmail.com (ROUAMBA Halidou) Date: Thu, 9 Oct 2008 11:52:45 +0000 Subject: [Linux-cluster] RHEL AS 4.7 Cluster : unable to create HP ILO fence device Message-ID: Hi , my pote I just finish installing RHEL AS 4.7 cluster suite on HP DL580 G5 platform. I'm create member node, cluster domain, but i can't create HP ILO fence device When i click on the OK button the bellow message is send to the system-config-cluster line commande: *[root at app-db2 ~]# system-config-cluster Traceback (most recent call last): File "/usr/share/system-config-cluster/ConfigTabController.py", line 1232, in on_fd_panel_ok return_list = self.fence_handler.validate_fencedevice(agent_type, None) File "/usr/share/system-config-cluster/FenceHandler.py", line 713, in validate_fencedevice returnlist = apply(self.fd_validate[agent_type], args) File "/usr/share/system-config-cluster/FenceHandler.py", line 932, in val_ilo_fd if self.ilo_ssh.get_active == True: AttributeError: 'NoneType' object has no attribute 'get_active' [root at app-db2 ~]#* you can find attached the file contained the screen capture. Thanks for all -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hp_ilo_screen.png Type: image/png Size: 52967 bytes Desc: not available URL: From mgrac at redhat.com Thu Oct 9 11:56:15 2008 From: mgrac at redhat.com (Marek 'marx' Grac) Date: Thu, 09 Oct 2008 13:56:15 +0200 Subject: [Linux-cluster] My patch In-Reply-To: <48EB3B1F.1010700@craigon.co.uk> References: <48EB3B1F.1010700@craigon.co.uk> Message-ID: <48EDF15F.9030000@redhat.com> David wrote: > Did this patch ever get merged in? > > https://www.redhat.com/archives/linux-cluster/2008-August/msg00026.html No, can you please create a bug in bugzilla? If it works without any problem (as we don't have such device), I can apply it. m, -- Marek Grac Red Hat Czech s.r.o. From jamesc at exa.com Thu Oct 9 15:18:11 2008 From: jamesc at exa.com (James Chamberlain) Date: Thu, 9 Oct 2008 11:18:11 -0400 Subject: [Linux-cluster] gfs_grow In-Reply-To: <48ED2247.8050406@ntsg.umt.edu> References: <81D8B57D-B9C8-4AA0-8BEC-F45212795FB6@exa.com> <48ED2247.8050406@ntsg.umt.edu> Message-ID: <5C2F3859-F436-42DB-8FB2-94ABAD52CD73@exa.com> Thanks Andrew. What I'm really hoping for is anything I can do to make this gfs_grow go faster. It's been running for 19 hours now, I have no idea when it'll complete, and the file system I'm trying to grow has been all but unusable for the duration. This is a very busy file system, and I know it's best to run gfs_grow on a quiet file system, but there isn't too much I can do about that. Alternatively, if anyone knows of a signal I could send to gfs_grow that would cause it to give a status report or increase verbosity, that would be helpful, too. I have tried both increasing and decreasing the number of NFS threads, but since I can't tell where I am in the process or how quickly it's going, I have no idea what effect this has on operations. Thanks, James On Oct 8, 2008, at 5:12 PM, Andrew A. Neuschwander wrote: > James, > > I have a CentOS 5.2 cluster where I would see the same nfs errors > under certain conditions. If I did anything that introduced latency > to my gfs operations on the node that served nfs, the nfs threads > couldn't service requests faster than they came in from clients. > Eventually my nfs threads would all be busy and start dropping nfs > requests. I kept an eye on my nfsd thread utilization (/proc/net/rpc/ > nfsd) and kept bumping up the number of threads until they could > handle all the requests while the gfs had a higher latency. > > In my case, I had EMC Networker streaming data from my gfs > filesystems to a local scsi tape device on the same node that served > nfs. I eventually separated them onto different nodes. > > I'm sure gfs_grow would slow down your gfs enough that your nfs > threads couldn't keep up. NFS on gfs seems to be very latency > sensitive. I have a quick an dirty perl script to generate a > historgram image from nfs thread stats if you are interested. > > -Andrew > -- > Andrew A. Neuschwander, RHCE > Linux Systems/Software Engineer > College of Forestry and Conservation > The University of Montana > http://www.ntsg.umt.edu > andrew at ntsg.umt.edu - 406.243.6310 > > > James Chamberlain wrote: >> Hi all, >> I'd like to thank Bob Peterson for helping me solve the last >> problem I was seeing with my storage cluster. I've got a new one >> now. A couple days ago, site ops plugged in a new storage shelf >> and this triggered some sort of error in the storage chassis. I >> was able to sort that out with gfs_fsck, and have since gotten the >> new storage recognized by the cluster. I'd like to make use of >> this new storage, and it's here that we run into trouble. >> lvextend completed with no trouble, so I ran gfs_grow. gfs_grow >> has been running for over an hour now and has not progressed past: >> [root at s12n01 ~]# gfs_grow /dev/s12/scratch13 >> FS: Mount Point: /scratch13 >> FS: Device: /dev/s12/scratch13 >> FS: Options: rw,noatime,nodiratime >> FS: Size: 4392290302 >> DEV: Size: 5466032128 >> Preparing to write new FS information... >> The load average on this node has risen from its normal ~30-40 to >> 513 (the number of nfsd threads, plus one), and the file system has >> become slow-to-inaccessible on client nodes. I am seeing messages >> in my log files that indicate things like: >> Oct 8 16:26:00 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >> when sending 140 bytes - shutting down socket >> Oct 8 16:26:00 s12n01 last message repeated 4 times >> Oct 8 16:26:00 s12n01 kernel: nfsd: peername failed (err 107)! >> Oct 8 16:26:58 s12n01 kernel: nfsd: peername failed (err 107)! >> Oct 8 16:27:56 s12n01 last message repeated 2 times >> Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >> when sending 140 bytes - shutting down socket >> Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >> when sending 140 bytes - shutting down socket >> Oct 8 16:27:56 s12n01 kernel: nfsd: peername failed (err 107)! >> Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >> when sending 140 bytes - shutting down socket >> Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >> when sending 140 bytes - shutting down socket >> Oct 8 16:27:56 s12n01 kernel: nfsd: peername failed (err 107)! >> Oct 8 16:28:34 s12n01 last message repeated 2 times >> Oct 8 16:30:29 s12n01 last message repeated 2 times >> I was seeing similar messages this morning, but those went away >> when I mounted this file system on another node in the cluster, >> turned on statfs_fast, and then moved the service to that node. >> I'm not sure what to do about it given that gfs_grow is running. >> Is this something anyone else has seen? Does anyone know what to >> do about this? Do I have any option other than to wait until >> gfs_grow is done? Given my recent experiences (see "lm_dlm_cancel" >> in the list archives), I'm very hesitant to hit ^C on this >> gfs_grow. I'm running CentOS 4 for x86-64, kernel >> 2.6.9-67.0.20.ELsmp. >> Thanks, >> James >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.montagutelli at unilim.fr Thu Oct 9 17:32:06 2008 From: xavier.montagutelli at unilim.fr (Xavier Montagutelli) Date: Thu, 09 Oct 2008 19:32:06 +0200 Subject: [Linux-cluster] RHEL AS 4.7 Cluster : unable to create HP ILO fence device In-Reply-To: References: Message-ID: <48EE4016.402@unilim.fr> ROUAMBA Halidou a ?crit : > > Hi , my pote > > I just finish installing RHEL AS 4.7 cluster suite on HP DL580 G5 > platform. > I'm create member node, cluster domain, but i can't create HP ILO > fence device > When i click on the OK button the bellow message is send to the > system-config-cluster line commande: > *[root at app-db2 ~]# system-config-cluster > Traceback (most recent call last): > File "/usr/share/system-config-cluster/ConfigTabController.py", line > 1232, in on_fd_panel_ok > > * If you have problem with the GUI, it's easy to modify the configuration file directly (/etc/cluster/cluster.conf). 1) increment the "config_version" attribute 2) add a fence device : 3) modify your cluster node : 4) inform ccs of the change : ccs_tool upgrade /etc/cluster/cluster.conf After that, you can test your fence device with the command : fence_node app-db1 -- Xavier Montagutelli From janar.kartau at gmail.com Thu Oct 9 18:38:43 2008 From: janar.kartau at gmail.com (Janar Kartau) Date: Thu, 09 Oct 2008 21:38:43 +0300 Subject: [Linux-cluster] GFS lockups ? In-Reply-To: <200810090840.58589.grimme@atix.de> References: <48ED4143.4070107@gmail.com> <200810090840.58589.grimme@atix.de> Message-ID: <48EE4FB3.8040600@gmail.com> Like i said, i couldn't find anything in the logs besides eviction messages after i manually reset the server. Yes, we do use PHP and sessions which use memcached as a backend. Janar Marc Grimme wrote: > On Thursday 09 October 2008 01:24:51 Janar Kartau wrote: > >> Hi, >> Recently our three-node webserver cluster started randomly crashing. I >> never had time to investigate what the problem was, cause i needed to >> bring them back online again. But it seemed like alla Apache processes >> just hang (couldn't even kill them).. waiting for something. The only >> thing that helped, was a reboot for all or couple of the nodes. Anyway, >> today i encountered this problem at night and i could look into it a >> little more. I noticed that some of the GFS filesystems were >> unaccessable (we have 5 of them, mounted on every nide) and of the nodes >> was completely unaccessable. So i guessed that this half-dead node was >> holding locks on the filesystems or sth. Did a hard reset on this dead >> node and all stabilized. >> Absolutely no cluster/GFS errors in the logs (besides the ones which >> tell that the half-dead node was leaving the cluster when i reset it). >> Nodes have CentOS 4.6 installed (2.6.9-67.0.7.ELsmp, dlm-1.0.7-1, >> GFS-6.1.15-1, cman-1.0.17-0.el4_6.5). We use EMC CX3-10c for GFS storage >> (over iSCSI) and EMC PowerPath for multipathing. Separate VLAN is used >> for CMAN/DLM traffic. >> Please give me ideas how to solve this or atleast some debugging tips as >> it's happening twice a day now and seems i simply can't help it. :( >> > > Could you provide more information like relevant syslogs and console messages? > > Are you using php with sessions? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janar.kartau at gmail.com Thu Oct 9 18:40:42 2008 From: janar.kartau at gmail.com (Janar Kartau) Date: Thu, 09 Oct 2008 21:40:42 +0300 Subject: [Linux-cluster] GFS lockups ? In-Reply-To: <0C318874-8DF9-4E40-BEC9-3CD90F8C85EA@gmail.com> References: <48ED4143.4070107@gmail.com> <0C318874-8DF9-4E40-BEC9-3CD90F8C85EA@gmail.com> Message-ID: <48EE502A.2040309@gmail.com> Hm.. didn't notice it before. Anyway, i didn't notice that dlm was doing any more job than usually. The most CPU-consuming processes on the alive nodes was "top" itself (although the load was around 600 because of the hang Apache procs). Janar Shawn Hood wrote: > See my thread from yesterday. Same general thing, but the dlm kernel > threads were eating cycles. > > Sent from my iPhone > > On Oct 8, 2008, at 7:24 PM, Janar Kartau wrote: > >> Hi, >> Recently our three-node webserver cluster started randomly crashing. I >> never had time to investigate what the problem was, cause i needed to >> bring them back online again. But it seemed like alla Apache processes >> just hang (couldn't even kill them).. waiting for something. The only >> thing that helped, was a reboot for all or couple of the nodes. Anyway, >> today i encountered this problem at night and i could look into it a >> little more. I noticed that some of the GFS filesystems were >> unaccessable (we have 5 of them, mounted on every nide) and of the nodes >> was completely unaccessable. So i guessed that this half-dead node was >> holding locks on the filesystems or sth. Did a hard reset on this dead >> node and all stabilized. >> Absolutely no cluster/GFS errors in the logs (besides the ones which >> tell that the half-dead node was leaving the cluster when i reset it). >> Nodes have CentOS 4.6 installed (2.6.9-67.0.7.ELsmp, dlm-1.0.7-1, >> GFS-6.1.15-1, cman-1.0.17-0.el4_6.5). We use EMC CX3-10c for GFS storage >> (over iSCSI) and EMC PowerPath for multipathing. Separate VLAN is used >> for CMAN/DLM traffic. >> Please give me ideas how to solve this or atleast some debugging tips as >> it's happening twice a day now and seems i simply can't help it. :( >> >> Janar Kartau >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From grimme at atix.de Thu Oct 9 19:19:57 2008 From: grimme at atix.de (Marc Grimme) Date: Thu, 9 Oct 2008 21:19:57 +0200 Subject: [Linux-cluster] GFS lockups ? In-Reply-To: <48EE4FB3.8040600@gmail.com> References: <48ED4143.4070107@gmail.com> <200810090840.58589.grimme@atix.de> <48EE4FB3.8040600@gmail.com> Message-ID: <200810092119.57501.grimme@atix.de> On Thursday 09 October 2008 20:38:43 Janar Kartau wrote: > Like i said, i couldn't find anything in the logs besides eviction > messages after i manually reset the server. Yes, we do use PHP and > sessions which use memcached as a backend. Don't know much about memcached as a backend but I recall we finally patched php so it uses flocks (as far as I remember or you can at least configure how you want to use session-filelocking) and after it apache is pretty stable. No *D*s any more because of this. I don't know what the status is with the php patch but I think it's still somewhere. I need to check back on this. -marc. > > Janar > > Marc Grimme wrote: > > On Thursday 09 October 2008 01:24:51 Janar Kartau wrote: > >> Hi, > >> Recently our three-node webserver cluster started randomly crashing. I > >> never had time to investigate what the problem was, cause i needed to > >> bring them back online again. But it seemed like alla Apache processes > >> just hang (couldn't even kill them).. waiting for something. The only > >> thing that helped, was a reboot for all or couple of the nodes. Anyway, > >> today i encountered this problem at night and i could look into it a > >> little more. I noticed that some of the GFS filesystems were > >> unaccessable (we have 5 of them, mounted on every nide) and of the nodes > >> was completely unaccessable. So i guessed that this half-dead node was > >> holding locks on the filesystems or sth. Did a hard reset on this dead > >> node and all stabilized. > >> Absolutely no cluster/GFS errors in the logs (besides the ones which > >> tell that the half-dead node was leaving the cluster when i reset it). > >> Nodes have CentOS 4.6 installed (2.6.9-67.0.7.ELsmp, dlm-1.0.7-1, > >> GFS-6.1.15-1, cman-1.0.17-0.el4_6.5). We use EMC CX3-10c for GFS storage > >> (over iSCSI) and EMC PowerPath for multipathing. Separate VLAN is used > >> for CMAN/DLM traffic. > >> Please give me ideas how to solve this or atleast some debugging tips as > >> it's happening twice a day now and seems i simply can't help it. :( > > > > Could you provide more information like relevant syslogs and console > > messages? > > > > Are you using php with sessions? -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX Informationstechnologie und Consulting AG Einsteinstr. 10 85716 Unterschleissheim Deutschland/Germany Phone: +49-89 452 3538-0 Fax: +49-89 990 1766-0 Registergericht: Amtsgericht Muenchen Registernummer: HRB 168930 USt.-Id.: DE209485962 Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) Vorsitzender des Aufsichtsrats: Dr. Martin Buss From terrybdavis at gmail.com Thu Oct 9 20:01:04 2008 From: terrybdavis at gmail.com (Terry Davis) Date: Thu, 9 Oct 2008 15:01:04 -0500 Subject: [Linux-cluster] clustering inside of vmware -- fencing Message-ID: <14139e3a0810091301q3b1d006al829dc52157bcdb22@mail.gmail.com> Hello, I am trying to set up a RHEL5 cluster inside of VMware (client request). I see there are hints of a fence_vmware script floating around. I found one on sources.redhat.com but this, coupled with the latest toolkit from vmware yields a missing VMware::VmPerl.pm file. This got me to step back and think about this a bit further. 1) is this a supported configuration? 2) what am I missing with the fence_vmware script? 3) can anyone share any working configurations with this? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.osullivan at auckland.ac.nz Thu Oct 9 20:30:48 2008 From: michael.osullivan at auckland.ac.nz (michael.osullivan at auckland.ac.nz) Date: Fri, 10 Oct 2008 09:30:48 +1300 (NZDT) Subject: [Linux-cluster] RE: Can't create LV in 2-node cluster Message-ID: <1385.128.187.136.18.1223584248.squirrel@mail.esc.auckland.ac.nz> * From: John Ruemker * To: linux clustering * Subject: Re: [Linux-cluster] RE: Can't create LV in 2-node cluster * Date: Wed, 08 Oct 2008 14:33:45 -0400 michael osullivan auckland ac nz wrote: Hi Mark, clvmd is running fine on both nodes. The result of "service clvmd status" is clvmd (pid xxxxx) is running... active volumes: LogVol00 LogVol01 The result of vgscan is Reading all physical volumes. This may take a while... Found volume group "iscsi_raid_vg" using metadata type lvm2 Found volume group "VolGroup00" using metadata type lvm2 I just can't create a logical volume either from the command line or using system-config-lvm... > Did you partition the device before adding a physical volume to it? If so, did you run partprobe on both nodes? A common scenario is to partition the device from node 1 and create a physical volume on it. However the partition table is not automatically read on the second node so it has no idea there is a partition there. When clvmd tells the second node to activate a vg or lv on this unknown device, that node responds that it can't lock on to the device since it has no idea what it is. If you do end > up in this situation then usually the solution is to do this on both nodes > > # rm /etc/lvm/cache/.cache > # partprobe > # clvmd -R > > Then from one node: > > # pvscan > # vgscan > # lvchange -ay vg/lv > > > Try this and see if it helps. > > -John Thanks for your help John, I didn't partition the device before adding the physical volume, I just used pvcreate. I tried you advice except for the last step as I have not managed to create a logical volume to activate. The error changed a little: lvcreate -n iscsi_raid_lv -l 4882 iscsi_raid_vg gives Error locking on node : Error backing up metadata, can't find VG for group vg Aborting. Failed to activate new LV to wipe the start of it. Note that "group vg" used to be "group #global" before I tried your solution. Any other ideas? This has got me really stumped. The commands I used to create the physical volume and volume group were: pvcreate /dev/iscsi_raid vgcreate -cy iscsi_raid_vg /dev/iscsi_raid But now lvcreate won't cooperate. Thanks again for any help. Also thanks for your previous help too. Thanks, Mike From andrew at ntsg.umt.edu Thu Oct 9 20:55:57 2008 From: andrew at ntsg.umt.edu (Andrew A. Neuschwander) Date: Thu, 09 Oct 2008 14:55:57 -0600 Subject: [Linux-cluster] clustering inside of vmware -- fencing In-Reply-To: <14139e3a0810091301q3b1d006al829dc52157bcdb22@mail.gmail.com> References: <14139e3a0810091301q3b1d006al829dc52157bcdb22@mail.gmail.com> Message-ID: <48EE6FDD.1050904@ntsg.umt.edu> VMware::VmPerl.pm a deprecated api. The current vmware perl api is the VI Perl Toolkit (VMware::VIM2Runtime). I have a centos 5.2 gfs cluster running in a VMware ESX cluster (with virtualcenter). I have a mix of CentOS VMs and physical machines participating in the GFS cluster. I modified the fence_vixel agent to use this new api, and called it fence_vi3. It's fairly basic and works, but could use some improvements. I've been using it for almost a year: https://www.redhat.com/archives/cluster-devel/2007-November/msg00056.html The old fence_vmware agent logs into a single ESX/GSX server using the old api and resets the targeted guest. fence_vi3 logs into the VirtualCenter and resets the requested VM. This is needed in a ESX cluster, since you don't know on which ESX machine your Centos/Rhel guest is running. I don't know if this is a supported setup, but it works. I've done a lot of heavy testing and optimizing of gfs in this setup. My volume group which hold my gfs filesystems is 14TB and has been in production for a good 6 months. Obviously, YMMV. -Andrew -- Andrew A. Neuschwander, RHCE Linux Systems/Software Engineer College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew at ntsg.umt.edu - 406.243.6310 Terry Davis wrote: > Hello, > I am trying to set up a RHEL5 cluster inside of VMware (client request). I > see there are hints of a fence_vmware script floating around. I found one > on sources.redhat.com but this, coupled with the latest toolkit from vmware > yields a missing VMware::VmPerl.pm file. This got me to step back and think > about this a bit further. > > 1) is this a supported configuration? > 2) what am I missing with the fence_vmware script? > 3) can anyone share any working configurations with this? > > Thanks! > > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From kanderso at redhat.com Thu Oct 9 21:18:42 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Thu, 09 Oct 2008 16:18:42 -0500 Subject: [Linux-cluster] clustering inside of vmware -- fencing In-Reply-To: <48EE6FDD.1050904@ntsg.umt.edu> References: <14139e3a0810091301q3b1d006al829dc52157bcdb22@mail.gmail.com> <48EE6FDD.1050904@ntsg.umt.edu> Message-ID: <1223587122.3016.18.camel@dhcp80-204.msp.redhat.com> There is a new implementation of the fence_vmware agent written in python in the GIT tree: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=tree;f=fence/agents/vmware;h=d07bfd7a8d6f445f15f793626cf493a9edf11833;hb=refs/heads/master This one uses the new libfence python infrastructure and hopefully does what you need. Check it out and let us know. Kevin On Thu, 2008-10-09 at 14:55 -0600, Andrew A. Neuschwander wrote: > VMware::VmPerl.pm a deprecated api. The current vmware perl api is the > VI Perl Toolkit (VMware::VIM2Runtime). I have a centos 5.2 gfs cluster > running in a VMware ESX cluster (with virtualcenter). I have a mix of > CentOS VMs and physical machines participating in the GFS cluster. > > I modified the fence_vixel agent to use this new api, and called it > fence_vi3. It's fairly basic and works, but could use some improvements. > I've been using it for almost a year: > > https://www.redhat.com/archives/cluster-devel/2007-November/msg00056.html > > The old fence_vmware agent logs into a single ESX/GSX server using the > old api and resets the targeted guest. fence_vi3 logs into the > VirtualCenter and resets the requested VM. This is needed in a ESX > cluster, since you don't know on which ESX machine your Centos/Rhel > guest is running. > > I don't know if this is a supported setup, but it works. I've done a lot > of heavy testing and optimizing of gfs in this setup. My volume group > which hold my gfs filesystems is 14TB and has been in production for a > good 6 months. Obviously, YMMV. > > -Andrew > -- > Andrew A. Neuschwander, RHCE > Linux Systems/Software Engineer > College of Forestry and Conservation > The University of Montana > http://www.ntsg.umt.edu > andrew at ntsg.umt.edu - 406.243.6310 > > > Terry Davis wrote: > > Hello, > > I am trying to set up a RHEL5 cluster inside of VMware (client request). I > > see there are hints of a fence_vmware script floating around. I found one > > on sources.redhat.com but this, coupled with the latest toolkit from vmware > > yields a missing VMware::VmPerl.pm file. This got me to step back and think > > about this a bit further. > > > > 1) is this a supported configuration? > > 2) what am I missing with the fence_vmware script? > > 3) can anyone share any working configurations with this? > > > > Thanks! > > > > > > > > ------------------------------------------------------------------------ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From jamesc at exa.com Thu Oct 9 22:53:58 2008 From: jamesc at exa.com (James Chamberlain) Date: Thu, 9 Oct 2008 18:53:58 -0400 Subject: [Linux-cluster] gfs_grow In-Reply-To: <5C2F3859-F436-42DB-8FB2-94ABAD52CD73@exa.com> References: <81D8B57D-B9C8-4AA0-8BEC-F45212795FB6@exa.com> <48ED2247.8050406@ntsg.umt.edu> <5C2F3859-F436-42DB-8FB2-94ABAD52CD73@exa.com> Message-ID: <0FA1858B-DA74-4D65-95CB-7EC21559FA6F@exa.com> The gfs_grow did finally complete, but now I've got another problem: Oct 9 17:55:49 s12n03 kernel: GFS: fsid=s12:scratch13.1: fatal: invalid metadata block Oct 9 17:55:49 s12n03 kernel: GFS: fsid=s12:scratch13.1: bh = 4314413922 (type: exp=5, found=4) Oct 9 17:55:49 s12n03 kernel: GFS: fsid=s12:scratch13.1: function = gfs_get_meta_buffer Oct 9 17:55:49 s12n03 kernel: GFS: fsid=s12:scratch13.1: file = / builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/gfs/dio.c, line = 1223 Oct 9 17:55:49 s12n03 kernel: GFS: fsid=s12:scratch13.1: time = 1223589349 Oct 9 17:55:49 s12n03 kernel: GFS: fsid=s12:scratch13.1: about to withdraw from the cluster Oct 9 17:55:49 s12n03 kernel: GFS: fsid=s12:scratch13.1: waiting for outstanding I/O Oct 9 17:55:49 s12n03 kernel: GFS: fsid=s12:scratch13.1: telling LM to withdraw Oct 9 17:55:50 s12n01 kernel: GFS: fsid=s12:scratch13.2: jid=1: Trying to acquire journal lock... Oct 9 17:55:50 s12n01 kernel: GFS: fsid=s12:scratch13.2: jid=1: Busy Oct 9 17:55:50 s12n02 kernel: GFS: fsid=s12:scratch13.0: jid=1: Trying to acquire journal lock... Oct 9 17:55:50 s12n02 kernel: GFS: fsid=s12:scratch13.0: jid=1: Looking at journal... Oct 9 17:55:51 s12n02 kernel: GFS: fsid=s12:scratch13.0: jid=1: Acquiring the transaction lock... Oct 9 17:55:51 s12n02 kernel: GFS: fsid=s12:scratch13.0: jid=1: Replaying journal... Oct 9 17:55:52 s12n02 kernel: GFS: fsid=s12:scratch13.0: jid=1: Replayed 1637 of 3945 blocks Oct 9 17:55:52 s12n02 kernel: GFS: fsid=s12:scratch13.0: jid=1: replays = 1637, skips = 115, sames = 2193 Oct 9 17:55:52 s12n03 kernel: lock_dlm: withdraw abandoned memory Oct 9 17:55:52 s12n02 kernel: GFS: fsid=s12:scratch13.0: jid=1: Journal replayed in 2s Oct 9 17:55:52 s12n03 kernel: GFS: fsid=s12:scratch13.1: withdrawn Oct 9 17:55:52 s12n02 kernel: GFS: fsid=s12:scratch13.0: jid=1: Done Oct 9 17:56:26 s12n03 clurgmgrd: [6611]: clusterfs:gfs- scratch13: Mount point is not accessible! Oct 9 17:56:26 s12n03 clurgmgrd[6611]: status on clusterfs:gfs-scratch13 returned 1 (generic error) Oct 9 17:56:26 s12n03 clurgmgrd[6611]: Stopping service scratch13 Oct 9 17:56:26 s12n03 clurgmgrd: [6611]: Removing IPv4 address 10.14.12.5 from bond0 Oct 9 17:56:36 s12n03 clurgmgrd: [6611]: /scratch13 is not a directory Oct 9 17:56:36 s12n03 clurgmgrd[6611]: stop on nfsclient:nfs- scratch13 returned 2 (invalid argument(s)) Oct 9 17:56:36 s12n03 clurgmgrd[6611]: #12: RG scratch13 failed to stop; intervention required Oct 9 17:56:36 s12n03 clurgmgrd[6611]: Service scratch13 is failed The history here is that a new storage shelf was added to the chassis. This somehow triggered an error on the chassis - a timeout of some sort, as I understand it from Site Ops - which I presume triggered this problem on this file system, since the two events were coincident. I have run gfs_fsck against this file system, but it didn't fix the problem - even when I used a newer version of gfs_fsck from RHEL 5 that had been back-ported to RHEL4. I had done this a couple of times before running the gfs_grow, and had hoped that the problem had been taken care of. Apparently not. Does anyone have any thoughts here? I can make the file system available again by killing off anything I suspect might be accessing that invalid metadata block, but that's not a good solution. Thanks, James On Oct 9, 2008, at 11:18 AM, James Chamberlain wrote: > Thanks Andrew. > > What I'm really hoping for is anything I can do to make this > gfs_grow go faster. It's been running for 19 hours now, I have no > idea when it'll complete, and the file system I'm trying to grow has > been all but unusable for the duration. This is a very busy file > system, and I know it's best to run gfs_grow on a quiet file system, > but there isn't too much I can do about that. Alternatively, if > anyone knows of a signal I could send to gfs_grow that would cause > it to give a status report or increase verbosity, that would be > helpful, too. I have tried both increasing and decreasing the > number of NFS threads, but since I can't tell where I am in the > process or how quickly it's going, I have no idea what effect this > has on operations. > > Thanks, > > James > > On Oct 8, 2008, at 5:12 PM, Andrew A. Neuschwander wrote: > >> James, >> >> I have a CentOS 5.2 cluster where I would see the same nfs errors >> under certain conditions. If I did anything that introduced latency >> to my gfs operations on the node that served nfs, the nfs threads >> couldn't service requests faster than they came in from clients. >> Eventually my nfs threads would all be busy and start dropping nfs >> requests. I kept an eye on my nfsd thread utilization (/proc/net/ >> rpc/nfsd) and kept bumping up the number of threads until they >> could handle all the requests while the gfs had a higher latency. >> >> In my case, I had EMC Networker streaming data from my gfs >> filesystems to a local scsi tape device on the same node that >> served nfs. I eventually separated them onto different nodes. >> >> I'm sure gfs_grow would slow down your gfs enough that your nfs >> threads couldn't keep up. NFS on gfs seems to be very latency >> sensitive. I have a quick an dirty perl script to generate a >> historgram image from nfs thread stats if you are interested. >> >> -Andrew >> -- >> Andrew A. Neuschwander, RHCE >> Linux Systems/Software Engineer >> College of Forestry and Conservation >> The University of Montana >> http://www.ntsg.umt.edu >> andrew at ntsg.umt.edu - 406.243.6310 >> >> >> James Chamberlain wrote: >>> Hi all, >>> I'd like to thank Bob Peterson for helping me solve the last >>> problem I was seeing with my storage cluster. I've got a new one >>> now. A couple days ago, site ops plugged in a new storage shelf >>> and this triggered some sort of error in the storage chassis. I >>> was able to sort that out with gfs_fsck, and have since gotten the >>> new storage recognized by the cluster. I'd like to make use of >>> this new storage, and it's here that we run into trouble. >>> lvextend completed with no trouble, so I ran gfs_grow. gfs_grow >>> has been running for over an hour now and has not progressed past: >>> [root at s12n01 ~]# gfs_grow /dev/s12/scratch13 >>> FS: Mount Point: /scratch13 >>> FS: Device: /dev/s12/scratch13 >>> FS: Options: rw,noatime,nodiratime >>> FS: Size: 4392290302 >>> DEV: Size: 5466032128 >>> Preparing to write new FS information... >>> The load average on this node has risen from its normal ~30-40 to >>> 513 (the number of nfsd threads, plus one), and the file system >>> has become slow-to-inaccessible on client nodes. I am seeing >>> messages in my log files that indicate things like: >>> Oct 8 16:26:00 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >>> when sending 140 bytes - shutting down socket >>> Oct 8 16:26:00 s12n01 last message repeated 4 times >>> Oct 8 16:26:00 s12n01 kernel: nfsd: peername failed (err 107)! >>> Oct 8 16:26:58 s12n01 kernel: nfsd: peername failed (err 107)! >>> Oct 8 16:27:56 s12n01 last message repeated 2 times >>> Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >>> when sending 140 bytes - shutting down socket >>> Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >>> when sending 140 bytes - shutting down socket >>> Oct 8 16:27:56 s12n01 kernel: nfsd: peername failed (err 107)! >>> Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >>> when sending 140 bytes - shutting down socket >>> Oct 8 16:27:56 s12n01 kernel: rpc-srv/tcp: nfsd: got error -104 >>> when sending 140 bytes - shutting down socket >>> Oct 8 16:27:56 s12n01 kernel: nfsd: peername failed (err 107)! >>> Oct 8 16:28:34 s12n01 last message repeated 2 times >>> Oct 8 16:30:29 s12n01 last message repeated 2 times >>> I was seeing similar messages this morning, but those went away >>> when I mounted this file system on another node in the cluster, >>> turned on statfs_fast, and then moved the service to that node. >>> I'm not sure what to do about it given that gfs_grow is running. >>> Is this something anyone else has seen? Does anyone know what to >>> do about this? Do I have any option other than to wait until >>> gfs_grow is done? Given my recent experiences (see >>> "lm_dlm_cancel" in the list archives), I'm very hesitant to hit ^C >>> on this gfs_grow. I'm running CentOS 4 for x86-64, kernel >>> 2.6.9-67.0.20.ELsmp. >>> Thanks, >>> James >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Fri Oct 10 04:53:45 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Fri, 10 Oct 2008 06:53:45 +0200 Subject: [Linux-cluster] Cluster Summit Report Message-ID: <1223614425.18797.17.camel@daitarn-fedora.int.fabbione.net> Hi all, The general feeling was that the Cluster Summit was a very good experience for everybody and that the amount of work done during those 3 days would have taken months on normal communication media. Of the 3 days schedule only 2 and half were required as the people have been way more efficient than expected. A lot of the pre-scheduled discussions have been dropped in a natural fashion as they were absorbed, discussed or deprecated at the source into other talks. People, coming from different environments with different experience and use cases, made a huge difference. While we did discuss to a greater level of technical details, this is a short summary of what will happen (in no particular order): Tree's splitting: - This item should be first and last at the same time. As a consequence of what has been decided, almost all trees will need to be divided and reorganized differently. As an example, RedHat specific bits will remain in one tree, while common components (such as dlm and fencing+fencing agents) will leave in their own separate projects. Details of the split are still to be determined. Low hanging fruits will be done first (gnbd and gfs* for example). - We discussed using clusterlabs.org as the go-to page for users, listing the versions of the latest (stable) components from all sources. The openSUSE Build Service could then be used as a hosting provider for this "community distro". - For the heartbeat tree, all that will eventually remain in it is the heartbeat "cluster infrastructure layer" (can't drop for backwards compatibility for a while). - Eventually some core libraries will migrate into corosync. - fabbione to coordinate the splitting. - lmb will coordinate the Linux-HA split and help with the build service stuff (if we go ahead with that). Standard fencing: - fencing daemon, libraries and agents will be merged (from RedHat and heartbeat) into two new projects (so that agents can be released independently from the daemon/libs). - fencing project will grow a simulator for regression testing (honza). The simulator will be a simple set of scripts that collect outputs from all known fencing devices and pass them back to the agents to test functionalities. While not perfect, it will still allow to do basic regression testing. We discussed this in terms of rewriting the RAs as simple python classes, which would interact with the world through IO abstractions (which would then be easy to capture/replay). - honzaf will write up an ABI/API for the agents which merges both functionalities and features. - Possibly agents will need to be rewritten/re-factored as part of the merge; some of the C plug-ins might become python classes etc - lmb, dejan, honza and dct to work on it. Release time lines: - As the trees will merge and split into separate projects, RM's will coordinate effort to make sure the new work will be available as modular as possible. - All releases will be available in neutral area for users to download in one shot as discussed previously. Standard logging: - Everybody to standardize on logsys. - The log recorder is worth mentioning here - buffering debug logging so that it can be dumped (retroactively) when a fault is encountered. Very useful feature. - heartbeat has a hb_report feature to gather logs, configurations, stack traces from core dumps etc from all cluster nodes, that'll be extended over time to support all this too - New features will be required in logsys to improve the user experience. Init scripts: - agreed that all init scripts shipped from upstream need to be LSB compliant and work in a distribution independent way. Users should not need to care when installing from our tarballs. - With portable packages, any differences should be hidden in there. Packaging from upstream: - in order to speed up adoption, our plan is to ship .spec and debian/ packaging format directly from upstream and with support from packagers. This will greatly reduce the time of propagation from upstream release into users that do not like installing manually. Packages can be built using the openSUSE build service to avoid requirement on new infrastructure. Standard quorum service: - Chrissie to implement the service within corosync/openais. - API has been discussed and explained in depth. Standard configuration: - New stack will standardize on CIB (from pacemaker). CIB is approx. a ccsd on steroids. - fabbione to look into CIB, and port libccs to libcib. - chrissie to port LDAP loader to CIB. Common shell scripting library for RA's: - Agreed to merge and review all RA's. This is a natural step as rgmanager will be deprecated. - lon and dejan to work on it. Clustered Samba: - More detailed investigation required but the short line is that performance testing are required. - Might require RA. - Investigate benefit from infiniband. - Nice to see samba integrated with corosync/openais. Split site: - There are 2 main scenarios for split site: - Metropolitan Area Clusters: "low" latency, redundancy affordable - Wide Area Clusters: high latency, expensive redundancy Each case has different problematic s (as latency and speed of the links). We will start tackling "remote" and only service/application fail-over. Data Replication will come later as users will demand it. - lmb to write the code for the "3rd site quorum" service tied into pacemaker resource dependency framework. - Identified need for some additional RAs to coordinate routing/address resolution switch-over; interfacing with routing protocols (BGP4/OSPF/etc) and DNS. Misc: - corosync release cycles - "Flatiron" to be released in time for February (+ Wilson/openAIS) - Need to understand effects of RDMA versus IP over infiniband - openSharedRoot presentation - Lots of unsolved issues, mostly related to clunky CDSL emulation, and the need to bring up significant portions of the stack before mounting root - NTT: - Raised lots of issues about supportability too - NTT will drive a stonith agent which works nicely with crashdumps too From wferi at niif.hu Fri Oct 10 09:08:27 2008 From: wferi at niif.hu (Ferenc Wagner) Date: Fri, 10 Oct 2008 11:08:27 +0200 Subject: [Linux-cluster] Cluster Summit Report In-Reply-To: <1223614425.18797.17.camel@daitarn-fedora.int.fabbione.net> (Fabio M. Di Nitto's message of "Fri, 10 Oct 2008 06:53:45 +0200") References: <1223614425.18797.17.camel@daitarn-fedora.int.fabbione.net> Message-ID: <87hc7ku8ac.fsf@tac.ki.iif.hu> "Fabio M. Di Nitto" writes: > - Agreed to merge and review all RA's. This is a natural step as > rgmanager will be deprecated. Thanks for the report, it was a very interesting read. And what will take the place of rgmanager? -- Regards, Feri. From stefano.biagiotti at vola.it Fri Oct 10 10:58:38 2008 From: stefano.biagiotti at vola.it (Stefano Biagiotti) Date: Fri, 10 Oct 2008 12:58:38 +0200 Subject: [Linux-cluster] RILOE and fencing Message-ID: <20081010105838.GA31738@palermo.priv2.gtn.it> I'm testing GFS on a 2-node cluster with CentOS-5. I'm succesfully running qdiskd, cman, clvmd, gfs and the GFS filesystem is up and running on both nodes, but I have some issues with fencing when I simulate a network outage. The stonith device on both nodes is Compaq RILOE (Remote Insight Lights-Out Edition). The fence_ilo command doesn't work to me... # fence_ilo -a node2-riloe -l admin -p $password -o off failed to turn off Is fence_ilo the correct tool to use with RILOE? I found fence_rib, but the manpage says it's deprecated: Name fence_rib - I/O Fencing agent for Compaq Remote Insight Lights Out card Description fence_rib is deprecated. fence_ilo should be used instead See Also fence_ilo(8) Thank you in advance. -- Stefano Biagiotti From fdinitto at redhat.com Sat Oct 11 07:20:12 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Sat, 11 Oct 2008 09:20:12 +0200 (CEST) Subject: [Linux-cluster] Cluster Summit Report In-Reply-To: <87hc7ku8ac.fsf@tac.ki.iif.hu> References: <1223614425.18797.17.camel@daitarn-fedora.int.fabbione.net> <87hc7ku8ac.fsf@tac.ki.iif.hu> Message-ID: On Fri, 10 Oct 2008, Ferenc Wagner wrote: > "Fabio M. Di Nitto" writes: > >> - Agreed to merge and review all RA's. This is a natural step as >> rgmanager will be deprecated. > > Thanks for the report, it was a very interesting read. > And what will take the place of rgmanager? rgmanager will be around for backward compatibility for one release to allow people a smooth upgrade. It will be replaced by pacemaker in the long run. Fabio -- I'm going to make him an offer he can't refuse. From sanelson at gmail.com Sat Oct 11 16:12:42 2008 From: sanelson at gmail.com (Stephen Nelson-Smith) Date: Sat, 11 Oct 2008 17:12:42 +0100 Subject: [Linux-cluster] Cib.xml --> haresources Message-ID: Hi, I want to managed a linux-ha cluster (heartbeat 2) with puppet, however I think I'm likely to struggle with cib.xml, as it contains runtime info, and the cluster software seems like it won't take kindly to being deployed automatically. I think the approach is to put the config in an haresources file, and use the haresources conversion to cib.xml tool. However, how do I put the knowledge I've put into my cib.xml (attached) into haresources? Or am I missing the point entirely and there's a much better/easier way to do this? S. -------------- next part -------------- A non-text attachment was scrubbed... Name: cib.xml Type: text/xml Size: 3390 bytes Desc: not available URL: From jmacfarland at nexatech.com Mon Oct 13 15:35:40 2008 From: jmacfarland at nexatech.com (Jeff Macfarland) Date: Mon, 13 Oct 2008 10:35:40 -0500 Subject: [Linux-cluster] Cluster monitoring In-Reply-To: References: Message-ID: <48F36ACC.3000307@nexatech.com> Federico Simoncelli wrote: > Hi all, > what is the best way to generate mail notification for cluster > events such as joins/leaves/fences? > I would rather not use an external monitor system like nagios and > ganglia but looks like those are the best practice for now. > Is there any other monitoring application/technique that I should consider? I've been meaning to look into this as well. Best I can find is called "RIND" (are we tired of recursive acronyms yet?) http://sources.redhat.com/cluster/wiki/EventScripting -- Jeff Macfarland (jmacfarland at nexatech.com) Nexa Technologies - 972.747.8879 Systems Administrator GPG Key ID: 0x5F1CA61B GPG Key Server: hkp://wwwkeys.pgp.net From jparsons at redhat.com Mon Oct 13 15:41:58 2008 From: jparsons at redhat.com (jim parsons) Date: Mon, 13 Oct 2008 11:41:58 -0400 Subject: [Linux-cluster] Cluster monitoring In-Reply-To: <48F36ACC.3000307@nexatech.com> References: <48F36ACC.3000307@nexatech.com> Message-ID: <1223912518.3298.3.camel@localhost.localdomain> On Mon, 2008-10-13 at 10:35 -0500, Jeff Macfarland wrote: > Federico Simoncelli wrote: > > Hi all, > > what is the best way to generate mail notification for cluster > > events such as joins/leaves/fences? > > I would rather not use an external monitor system like nagios and > > ganglia but looks like those are the best practice for now. > > Is there any other monitoring application/technique that I should consider? > > I've been meaning to look into this as well. Best I can find is called > "RIND" (are we tired of recursive acronyms yet?) > > http://sources.redhat.com/cluster/wiki/EventScripting > One easy way for just fence notifications would be to write a fence agent that sent mail to a mail list you could include as one of its cluster.conf attributes. Then place it in each fence block you define as the first fence action. Extra credit would be mailing the success/failure of the fence attempt. :) -J From gordan at bobich.net Mon Oct 13 15:55:22 2008 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 13 Oct 2008 16:55:22 +0100 Subject: [Linux-cluster] Cluster Aware Software RAID (md) Message-ID: <48BF360208406B67@> (added by '') Has any progress been made on this? I saw some posts from 3-4 years ago in the OpenGFS archives saying it was worked on, but not seen anything since. What I'm tring to do is have 2 servers with RAID15 (or 16, or 10) between them. Have each disk mirrored with DRBD, and md RAID on top (and GFS on top of that). I can see this would work with a fail-over configuration, but in active-active there would be RAID metadata inconsistencies. Is there a way to handle the active-active scenario? I could invert the md and DRBD layers, but this would result in a massive reduction in fault tolerance (RAID51 instead RAID15), so I'd rather like to avoid this. TIA. Gordan From shawnlhood at gmail.com Mon Oct 13 19:33:42 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Mon, 13 Oct 2008 15:33:42 -0400 Subject: [Linux-cluster] GFS reserved blocks? Message-ID: Does GFS reserve blocks for the superuser, a la ext3's "Reserved block count"? I've had a ~1.1TB FS report that it's full with df reporting ~100GB remaining. -- Shawn Hood 910.670.1819 m From jason.huddleston at verizon.com Mon Oct 13 20:00:23 2008 From: jason.huddleston at verizon.com (Jason Huddleston) Date: Mon, 13 Oct 2008 15:00:23 -0500 Subject: [Linux-cluster] GFS reserved blocks? In-Reply-To: References: Message-ID: <48F3A8D7.3000309@verizon.com> Shawn, I have been seeing the same thing on one of my clusters (shown below) under Red Hat 4.6. I found some details on this under an article on the open-shared root web site (http://www.open-sharedroot.org/faq/troubleshooting-guide/file-systems/gfs/file-system-full) and an article in Red Hat's knowledge base (http://kbase.redhat.com/faq/FAQ_78_10697.shtm). It seems to be a bug in the reclaim of metadata blocks when an inode is released. I saw a patch (bz298931) released for this in the 2.99.10 cluster release notes but it was reverted (bz298931) a few days after it was submitted. The only suggestion that I have gotten back from Red Hat is to shutdown the app so the GFS drives are not being accessed and then run the "gfs_tool reclaim " command. [root at omzdwcdrp003 ~]# gfs_tool df /l1load1 /l1load1: SB lock proto = "lock_dlm" SB lock table = "DWCDR_prod:l1load1" SB ondisk format = 1309 SB multihost format = 1401 Block size = 4096 Journals = 20 Resource Groups = 6936 Mounted lock proto = "lock_dlm" Mounted lock table = "DWCDR_prod:l1load1" Mounted host data = "" Journal number = 13 Lock module flags = Local flocks = FALSE Local caching = FALSE Oopses OK = FALSE Type Total Used Free use% ------------------------------------------------------------------------ inodes 155300 155300 0 100% metadata 2016995 675430 1341565 33% data 452302809 331558847 120743962 73% [root at omzdwcdrp003 ~]# df -h /l1load1 Filesystem Size Used Avail Use% Mounted on /dev/mapper/l1load1--vg-l1load1--lv 1.7T 1.3T 468G 74% /l1load1 [root at omzdwcdrp003 ~]# du -sh /l1load1 18G /l1load1 ---- Jason Huddleston, RHCE ---- PS-USE-Linux Partner Support - Unix Support and Engineering Verizon Information Processing Services Shawn Hood wrote: > Does GFS reserve blocks for the superuser, a la ext3's "Reserved block > count"? I've had a ~1.1TB FS report that it's full with df reporting > ~100GB remaining. > > From shawnlhood at gmail.com Mon Oct 13 20:02:59 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Mon, 13 Oct 2008 16:02:59 -0400 Subject: [Linux-cluster] GFS reserved blocks? In-Reply-To: <48F3A8D7.3000309@verizon.com> References: <48F3A8D7.3000309@verizon.com> Message-ID: I actually just ran the reclaim on a live filesystem and it seems to be working okay now. Hopefully this isn't problematic, as a large number of operations in the GFS tool suite operate on mounted filesystems. Shawn On Mon, Oct 13, 2008 at 4:00 PM, Jason Huddleston wrote: > Shawn, > I have been seeing the same thing on one of my clusters (shown below) > under Red Hat 4.6. I found some details on this under an article on the > open-shared root web site > (http://www.open-sharedroot.org/faq/troubleshooting-guide/file-systems/gfs/file-system-full) > and an article in Red Hat's knowledge base > (http://kbase.redhat.com/faq/FAQ_78_10697.shtm). It seems to be a bug in the > reclaim of metadata blocks when an inode is released. I saw a patch > (bz298931) released for this in the 2.99.10 cluster release notes but it was > reverted (bz298931) a few days after it was submitted. The only suggestion > that I have gotten back from Red Hat is to shutdown the app so the GFS > drives are not being accessed and then run the "gfs_tool reclaim point>" command. > > [root at omzdwcdrp003 ~]# gfs_tool df /l1load1 > /l1load1: > SB lock proto = "lock_dlm" > SB lock table = "DWCDR_prod:l1load1" > SB ondisk format = 1309 > SB multihost format = 1401 > Block size = 4096 > Journals = 20 > Resource Groups = 6936 > Mounted lock proto = "lock_dlm" > Mounted lock table = "DWCDR_prod:l1load1" > Mounted host data = "" > Journal number = 13 > Lock module flags = > Local flocks = FALSE > Local caching = FALSE > Oopses OK = FALSE > > Type Total Used Free use% > ------------------------------------------------------------------------ > inodes 155300 155300 0 100% > metadata 2016995 675430 1341565 33% > data 452302809 331558847 120743962 73% > [root at omzdwcdrp003 ~]# df -h /l1load1 > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/l1load1--vg-l1load1--lv > 1.7T 1.3T 468G 74% /l1load1 > [root at omzdwcdrp003 ~]# du -sh /l1load1 > 18G /l1load1 > > ---- > Jason Huddleston, RHCE > ---- > PS-USE-Linux > Partner Support - Unix Support and Engineering > Verizon Information Processing Services > > > > Shawn Hood wrote: >> >> Does GFS reserve blocks for the superuser, a la ext3's "Reserved block >> count"? I've had a ~1.1TB FS report that it's full with df reporting >> ~100GB remaining. >> >> > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Shawn Hood 910.670.1819 m From jason.huddleston at verizon.com Mon Oct 13 20:18:51 2008 From: jason.huddleston at verizon.com (Jason Huddleston) Date: Mon, 13 Oct 2008 15:18:51 -0500 Subject: [Linux-cluster] GFS reserved blocks? In-Reply-To: References: <48F3A8D7.3000309@verizon.com> Message-ID: <48F3AD2B.3090504@verizon.com> I've been watching mine do this for about two months now. I think it started when I upgraded from RHEL 4.5 to 4.6. The app team only has about 18 gig used on that 1.7TB drive but they create and delete allot of files because that is the loading area they used when new data comes in. In the last month I have seen it go up to 70 to 85% used but it usually comes back down to about 50% within about 24 hours. Hopefully they will find a fix for this soon. --- Jay Shawn Hood wrote: > I actually just ran the reclaim on a live filesystem and it seems to > be working okay now. Hopefully this isn't problematic, as a large > number of operations in the GFS tool suite operate on mounted > filesystems. > > Shawn > > On Mon, Oct 13, 2008 at 4:00 PM, Jason Huddleston > wrote: > >> Shawn, >> I have been seeing the same thing on one of my clusters (shown below) >> under Red Hat 4.6. I found some details on this under an article on the >> open-shared root web site >> (http://www.open-sharedroot.org/faq/troubleshooting-guide/file-systems/gfs/file-system-full) >> and an article in Red Hat's knowledge base >> (http://kbase.redhat.com/faq/FAQ_78_10697.shtm). It seems to be a bug in the >> reclaim of metadata blocks when an inode is released. I saw a patch >> (bz298931) released for this in the 2.99.10 cluster release notes but it was >> reverted (bz298931) a few days after it was submitted. The only suggestion >> that I have gotten back from Red Hat is to shutdown the app so the GFS >> drives are not being accessed and then run the "gfs_tool reclaim > point>" command. >> >> [root at omzdwcdrp003 ~]# gfs_tool df /l1load1 >> /l1load1: >> SB lock proto = "lock_dlm" >> SB lock table = "DWCDR_prod:l1load1" >> SB ondisk format = 1309 >> SB multihost format = 1401 >> Block size = 4096 >> Journals = 20 >> Resource Groups = 6936 >> Mounted lock proto = "lock_dlm" >> Mounted lock table = "DWCDR_prod:l1load1" >> Mounted host data = "" >> Journal number = 13 >> Lock module flags = >> Local flocks = FALSE >> Local caching = FALSE >> Oopses OK = FALSE >> >> Type Total Used Free use% >> ------------------------------------------------------------------------ >> inodes 155300 155300 0 100% >> metadata 2016995 675430 1341565 33% >> data 452302809 331558847 120743962 73% >> [root at omzdwcdrp003 ~]# df -h /l1load1 >> Filesystem Size Used Avail Use% Mounted on >> /dev/mapper/l1load1--vg-l1load1--lv >> 1.7T 1.3T 468G 74% /l1load1 >> [root at omzdwcdrp003 ~]# du -sh /l1load1 >> 18G /l1load1 >> >> ---- >> Jason Huddleston, RHCE >> ---- >> PS-USE-Linux >> Partner Support - Unix Support and Engineering >> Verizon Information Processing Services >> >> >> >> Shawn Hood wrote: >> >>> Does GFS reserve blocks for the superuser, a la ext3's "Reserved block >>> count"? I've had a ~1.1TB FS report that it's full with df reporting >>> ~100GB remaining. >>> >>> >>> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From beekhof at gmail.com Mon Oct 13 20:20:04 2008 From: beekhof at gmail.com (Andrew Beekhof) Date: Mon, 13 Oct 2008 22:20:04 +0200 Subject: [Linux-cluster] Cib.xml --> haresources In-Reply-To: References: Message-ID: <26ef5e70810131320w167f96fcw8301573a86884c12@mail.gmail.com> 2008/10/11 Stephen Nelson-Smith : > Hi, > > I want to managed a linux-ha cluster (heartbeat 2) with puppet, > however I think I'm likely to struggle with cib.xml, as it contains > runtime info, and the cluster software seems like it won't take kindly > to being deployed automatically. > > I think the approach is to put the config in an haresources file, and > use the haresources conversion to cib.xml tool. However, how do I put > the knowledge I've put into my cib.xml (attached) into haresources? You can't and shouldn't even if you could. That tool is intended to be run once per cluster and not on a recurring basis. > Or am I missing the point entirely and there's a much better/easier > way to do this? cibadmin -R http://clusterlabs.org/mw/Image:Configuration_Explained.pdf > > S. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From kanderso at redhat.com Mon Oct 13 20:29:41 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Mon, 13 Oct 2008 15:29:41 -0500 Subject: [Linux-cluster] GFS reserved blocks? In-Reply-To: <48F3AD2B.3090504@verizon.com> References: <48F3A8D7.3000309@verizon.com> <48F3AD2B.3090504@verizon.com> Message-ID: <1223929781.2991.47.camel@dhcp80-204.msp.redhat.com> For gfs, the recommended solution is to periodically run gfs_tool reclaim on your filesystems at a time of your choosing. Depending on the frequency of your deletes, this might be once a day or once a week. The only downside is the during the reclaim operation, the filesystem is locked from other activities. As the reclaim is relatively fast, this doesn't really cause a problem. But scheduling the command to be run during "idle" times of the day will mitigate the impact. We attempted to come up with a method of doing this automatically, but there are deadlock lock issues between gfs and the vfs layer that prevent it from being implemented. In addition, there is still the issue of when is the right time to do the reclaim, and this would be application specific. So, just run gfs_tool reclaim if your storage is getting consumed by metadata storage. Kevin On Mon, 2008-10-13 at 15:18 -0500, Jason Huddleston wrote: > I've been watching mine do this for about two months now. I think it > started when I upgraded from RHEL 4.5 to 4.6. The app team only has > about 18 gig used on that 1.7TB drive but they create and delete allot > of files because that is the loading area they used when new data > comes in. In the last month I have seen it go up to 70 to 85% used but > it usually comes back down to about 50% within about 24 hours. > Hopefully they will find a fix for this soon. > > --- > Jay > > Shawn Hood wrote: > > I actually just ran the reclaim on a live filesystem and it seems to > > be working okay now. Hopefully this isn't problematic, as a large > > number of operations in the GFS tool suite operate on mounted > > filesystems. > > > > Shawn > > > > On Mon, Oct 13, 2008 at 4:00 PM, Jason Huddleston > > wrote: > > > > > Shawn, > > > I have been seeing the same thing on one of my clusters (shown below) > > > under Red Hat 4.6. I found some details on this under an article on the > > > open-shared root web site > > > (http://www.open-sharedroot.org/faq/troubleshooting-guide/file-systems/gfs/file-system-full) > > > and an article in Red Hat's knowledge base > > > (http://kbase.redhat.com/faq/FAQ_78_10697.shtm). It seems to be a bug in the > > > reclaim of metadata blocks when an inode is released. I saw a patch > > > (bz298931) released for this in the 2.99.10 cluster release notes but it was > > > reverted (bz298931) a few days after it was submitted. The only suggestion > > > that I have gotten back from Red Hat is to shutdown the app so the GFS > > > drives are not being accessed and then run the "gfs_tool reclaim > > point>" command. > > > > > > [root at omzdwcdrp003 ~]# gfs_tool df /l1load1 > > > /l1load1: > > > SB lock proto = "lock_dlm" > > > SB lock table = "DWCDR_prod:l1load1" > > > SB ondisk format = 1309 > > > SB multihost format = 1401 > > > Block size = 4096 > > > Journals = 20 > > > Resource Groups = 6936 > > > Mounted lock proto = "lock_dlm" > > > Mounted lock table = "DWCDR_prod:l1load1" > > > Mounted host data = "" > > > Journal number = 13 > > > Lock module flags = > > > Local flocks = FALSE > > > Local caching = FALSE > > > Oopses OK = FALSE > > > > > > Type Total Used Free use% > > > ------------------------------------------------------------------------ > > > inodes 155300 155300 0 100% > > > metadata 2016995 675430 1341565 33% > > > data 452302809 331558847 120743962 73% > > > [root at omzdwcdrp003 ~]# df -h /l1load1 > > > Filesystem Size Used Avail Use% Mounted on > > > /dev/mapper/l1load1--vg-l1load1--lv > > > 1.7T 1.3T 468G 74% /l1load1 > > > [root at omzdwcdrp003 ~]# du -sh /l1load1 > > > 18G /l1load1 > > > > > > ---- > > > Jason Huddleston, RHCE > > > ---- > > > PS-USE-Linux > > > Partner Support - Unix Support and Engineering > > > Verizon Information Processing Services > > > > > > > > > > > > Shawn Hood wrote: > > > > > > > Does GFS reserve blocks for the superuser, a la ext3's "Reserved block > > > > count"? I've had a ~1.1TB FS report that it's full with df reporting > > > > ~100GB remaining. > > > > > > > > > > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From shawnlhood at gmail.com Mon Oct 13 20:33:25 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Mon, 13 Oct 2008 16:33:25 -0400 Subject: [Linux-cluster] GFS reserved blocks? In-Reply-To: <1223929781.2991.47.camel@dhcp80-204.msp.redhat.com> References: <48F3A8D7.3000309@verizon.com> <48F3AD2B.3090504@verizon.com> <1223929781.2991.47.camel@dhcp80-204.msp.redhat.com> Message-ID: Someone give me write access to the FAQ! I've been compiling these undocumented (or hard to find) bits of knowledge for some time now. Shawn On Mon, Oct 13, 2008 at 4:29 PM, Kevin Anderson wrote: > For gfs, the recommended solution is to periodically run gfs_tool > reclaim on your filesystems at a time of your choosing. Depending on > the frequency of your deletes, this might be once a day or once a week. > The only downside is the during the reclaim operation, the filesystem is > locked from other activities. As the reclaim is relatively fast, this > doesn't really cause a problem. But scheduling the command to be run > during "idle" times of the day will mitigate the impact. > > We attempted to come up with a method of doing this automatically, but > there are deadlock lock issues between gfs and the vfs layer that > prevent it from being implemented. In addition, there is still the > issue of when is the right time to do the reclaim, and this would be > application specific. > > So, just run gfs_tool reclaim if your storage is getting consumed by > metadata storage. > > Kevin > > On Mon, 2008-10-13 at 15:18 -0500, Jason Huddleston wrote: >> I've been watching mine do this for about two months now. I think it >> started when I upgraded from RHEL 4.5 to 4.6. The app team only has >> about 18 gig used on that 1.7TB drive but they create and delete allot >> of files because that is the loading area they used when new data >> comes in. In the last month I have seen it go up to 70 to 85% used but >> it usually comes back down to about 50% within about 24 hours. >> Hopefully they will find a fix for this soon. >> >> --- >> Jay >> >> Shawn Hood wrote: >> > I actually just ran the reclaim on a live filesystem and it seems to >> > be working okay now. Hopefully this isn't problematic, as a large >> > number of operations in the GFS tool suite operate on mounted >> > filesystems. >> > >> > Shawn >> > >> > On Mon, Oct 13, 2008 at 4:00 PM, Jason Huddleston >> > wrote: >> > >> > > Shawn, >> > > I have been seeing the same thing on one of my clusters (shown below) >> > > under Red Hat 4.6. I found some details on this under an article on the >> > > open-shared root web site >> > > (http://www.open-sharedroot.org/faq/troubleshooting-guide/file-systems/gfs/file-system-full) >> > > and an article in Red Hat's knowledge base >> > > (http://kbase.redhat.com/faq/FAQ_78_10697.shtm). It seems to be a bug in the >> > > reclaim of metadata blocks when an inode is released. I saw a patch >> > > (bz298931) released for this in the 2.99.10 cluster release notes but it was >> > > reverted (bz298931) a few days after it was submitted. The only suggestion >> > > that I have gotten back from Red Hat is to shutdown the app so the GFS >> > > drives are not being accessed and then run the "gfs_tool reclaim > > > point>" command. >> > > >> > > [root at omzdwcdrp003 ~]# gfs_tool df /l1load1 >> > > /l1load1: >> > > SB lock proto = "lock_dlm" >> > > SB lock table = "DWCDR_prod:l1load1" >> > > SB ondisk format = 1309 >> > > SB multihost format = 1401 >> > > Block size = 4096 >> > > Journals = 20 >> > > Resource Groups = 6936 >> > > Mounted lock proto = "lock_dlm" >> > > Mounted lock table = "DWCDR_prod:l1load1" >> > > Mounted host data = "" >> > > Journal number = 13 >> > > Lock module flags = >> > > Local flocks = FALSE >> > > Local caching = FALSE >> > > Oopses OK = FALSE >> > > >> > > Type Total Used Free use% >> > > ------------------------------------------------------------------------ >> > > inodes 155300 155300 0 100% >> > > metadata 2016995 675430 1341565 33% >> > > data 452302809 331558847 120743962 73% >> > > [root at omzdwcdrp003 ~]# df -h /l1load1 >> > > Filesystem Size Used Avail Use% Mounted on >> > > /dev/mapper/l1load1--vg-l1load1--lv >> > > 1.7T 1.3T 468G 74% /l1load1 >> > > [root at omzdwcdrp003 ~]# du -sh /l1load1 >> > > 18G /l1load1 >> > > >> > > ---- >> > > Jason Huddleston, RHCE >> > > ---- >> > > PS-USE-Linux >> > > Partner Support - Unix Support and Engineering >> > > Verizon Information Processing Services >> > > >> > > >> > > >> > > Shawn Hood wrote: >> > > >> > > > Does GFS reserve blocks for the superuser, a la ext3's "Reserved block >> > > > count"? I've had a ~1.1TB FS report that it's full with df reporting >> > > > ~100GB remaining. >> > > > >> > > > >> > > > >> > > -- >> > > Linux-cluster mailing list >> > > Linux-cluster at redhat.com >> > > https://www.redhat.com/mailman/listinfo/linux-cluster >> > > >> > > >> > >> > >> > >> > >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Shawn Hood 910.670.1819 m From jason.huddleston at verizon.com Mon Oct 13 20:39:26 2008 From: jason.huddleston at verizon.com (Jason Huddleston) Date: Mon, 13 Oct 2008 15:39:26 -0500 Subject: [Linux-cluster] GFS reserved blocks? In-Reply-To: References: <48F3A8D7.3000309@verizon.com> <48F3AD2B.3090504@verizon.com> <1223929781.2991.47.camel@dhcp80-204.msp.redhat.com> Message-ID: <48F3B1FE.3090500@verizon.com> Sweet. Maybe your notes will save someone else some time. I know it was a great resource for me when I set up my first GFS cluster. --- Jay Shawn Hood wrote: > Someone give me write access to the FAQ! I've been compiling these > undocumented (or hard to find) bits of knowledge for some time now. > > > Shawn > > On Mon, Oct 13, 2008 at 4:29 PM, Kevin Anderson wrote: > >> For gfs, the recommended solution is to periodically run gfs_tool >> reclaim on your filesystems at a time of your choosing. Depending on >> the frequency of your deletes, this might be once a day or once a week. >> The only downside is the during the reclaim operation, the filesystem is >> locked from other activities. As the reclaim is relatively fast, this >> doesn't really cause a problem. But scheduling the command to be run >> during "idle" times of the day will mitigate the impact. >> >> We attempted to come up with a method of doing this automatically, but >> there are deadlock lock issues between gfs and the vfs layer that >> prevent it from being implemented. In addition, there is still the >> issue of when is the right time to do the reclaim, and this would be >> application specific. >> >> So, just run gfs_tool reclaim if your storage is getting consumed by >> metadata storage. >> >> Kevin >> >> On Mon, 2008-10-13 at 15:18 -0500, Jason Huddleston wrote: >> >>> I've been watching mine do this for about two months now. I think it >>> started when I upgraded from RHEL 4.5 to 4.6. The app team only has >>> about 18 gig used on that 1.7TB drive but they create and delete allot >>> of files because that is the loading area they used when new data >>> comes in. In the last month I have seen it go up to 70 to 85% used but >>> it usually comes back down to about 50% within about 24 hours. >>> Hopefully they will find a fix for this soon. >>> >>> --- >>> Jay >>> >>> Shawn Hood wrote: >>> >>>> I actually just ran the reclaim on a live filesystem and it seems to >>>> be working okay now. Hopefully this isn't problematic, as a large >>>> number of operations in the GFS tool suite operate on mounted >>>> filesystems. >>>> >>>> Shawn >>>> >>>> On Mon, Oct 13, 2008 at 4:00 PM, Jason Huddleston >>>> wrote: >>>> >>>> >>>>> Shawn, >>>>> I have been seeing the same thing on one of my clusters (shown below) >>>>> under Red Hat 4.6. I found some details on this under an article on the >>>>> open-shared root web site >>>>> (http://www.open-sharedroot.org/faq/troubleshooting-guide/file-systems/gfs/file-system-full) >>>>> and an article in Red Hat's knowledge base >>>>> (http://kbase.redhat.com/faq/FAQ_78_10697.shtm). It seems to be a bug in the >>>>> reclaim of metadata blocks when an inode is released. I saw a patch >>>>> (bz298931) released for this in the 2.99.10 cluster release notes but it was >>>>> reverted (bz298931) a few days after it was submitted. The only suggestion >>>>> that I have gotten back from Red Hat is to shutdown the app so the GFS >>>>> drives are not being accessed and then run the "gfs_tool reclaim >>>> point>" command. >>>>> >>>>> [root at omzdwcdrp003 ~]# gfs_tool df /l1load1 >>>>> /l1load1: >>>>> SB lock proto = "lock_dlm" >>>>> SB lock table = "DWCDR_prod:l1load1" >>>>> SB ondisk format = 1309 >>>>> SB multihost format = 1401 >>>>> Block size = 4096 >>>>> Journals = 20 >>>>> Resource Groups = 6936 >>>>> Mounted lock proto = "lock_dlm" >>>>> Mounted lock table = "DWCDR_prod:l1load1" >>>>> Mounted host data = "" >>>>> Journal number = 13 >>>>> Lock module flags = >>>>> Local flocks = FALSE >>>>> Local caching = FALSE >>>>> Oopses OK = FALSE >>>>> >>>>> Type Total Used Free use% >>>>> ------------------------------------------------------------------------ >>>>> inodes 155300 155300 0 100% >>>>> metadata 2016995 675430 1341565 33% >>>>> data 452302809 331558847 120743962 73% >>>>> [root at omzdwcdrp003 ~]# df -h /l1load1 >>>>> Filesystem Size Used Avail Use% Mounted on >>>>> /dev/mapper/l1load1--vg-l1load1--lv >>>>> 1.7T 1.3T 468G 74% /l1load1 >>>>> [root at omzdwcdrp003 ~]# du -sh /l1load1 >>>>> 18G /l1load1 >>>>> >>>>> ---- >>>>> Jason Huddleston, RHCE >>>>> ---- >>>>> PS-USE-Linux >>>>> Partner Support - Unix Support and Engineering >>>>> Verizon Information Processing Services >>>>> >>>>> >>>>> >>>>> Shawn Hood wrote: >>>>> >>>>> >>>>>> Does GFS reserve blocks for the superuser, a la ext3's "Reserved block >>>>>> count"? I've had a ~1.1TB FS report that it's full with df reporting >>>>>> ~100GB remaining. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster at redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shawnlhood at gmail.com Mon Oct 13 21:32:42 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Mon, 13 Oct 2008 17:32:42 -0400 Subject: [Linux-cluster] Re: GFS hanging on 3 node RHEL4 cluster In-Reply-To: References: Message-ID: As a heads up, I'm about to open a high priority bug on this. It's crippling us. Also, I meant to say it is a 4 node cluster, not a 3 node. Please let me know if I can provide any more information in addition to this. I will provide the information from a time series of gfs_tool counters commands with the support request. Shawn On Tue, Oct 7, 2008 at 1:40 PM, Shawn Hood wrote: > More info: > > All filesystems mounted using noatime,nodiratime,noquota. > > All filesystems report the same data from gfs_tool gettune: > > limit1 = 100 > ilimit1_tries = 3 > ilimit1_min = 1 > ilimit2 = 500 > ilimit2_tries = 10 > ilimit2_min = 3 > demote_secs = 300 > incore_log_blocks = 1024 > jindex_refresh_secs = 60 > depend_secs = 60 > scand_secs = 5 > recoverd_secs = 60 > logd_secs = 1 > quotad_secs = 5 > inoded_secs = 15 > glock_purge = 0 > quota_simul_sync = 64 > quota_warn_period = 10 > atime_quantum = 3600 > quota_quantum = 60 > quota_scale = 1.0000 (1, 1) > quota_enforce = 0 > quota_account = 0 > new_files_jdata = 0 > new_files_directio = 0 > max_atomic_write = 4194304 > max_readahead = 262144 > lockdump_size = 131072 > stall_secs = 600 > complain_secs = 10 > reclaim_limit = 5000 > entries_per_readdir = 32 > prefetch_secs = 10 > statfs_slots = 64 > max_mhc = 10000 > greedy_default = 100 > greedy_quantum = 25 > greedy_max = 250 > rgrp_try_threshold = 100 > statfs_fast = 0 > seq_readahead = 0 > > > And data on the FS from gfs_tool counters: > locks 2948 > locks held 1352 > freeze count 0 > incore inodes 1347 > metadata buffers 0 > unlinked inodes 0 > quota IDs 0 > incore log buffers 0 > log space used 0.05% > meta header cache entries 0 > glock dependencies 0 > glocks on reclaim list 0 > log wraps 2 > outstanding LM calls 0 > outstanding BIO calls 0 > fh2dentry misses 0 > glocks reclaimed 223287 > glock nq calls 1812286 > glock dq calls 1810926 > glock prefetch calls 101158 > lm_lock calls 198294 > lm_unlock calls 142643 > lm callbacks 341621 > address operations 502691 > dentry operations 395330 > export operations 0 > file operations 199243 > inode operations 984276 > super operations 1727082 > vm operations 0 > block I/O reads 520531 > block I/O writes 130315 > > locks 171423 > locks held 85717 > freeze count 0 > incore inodes 85376 > metadata buffers 1474 > unlinked inodes 0 > quota IDs 0 > incore log buffers 24 > log space used 0.83% > meta header cache entries 6621 > glock dependencies 2037 > glocks on reclaim list 0 > log wraps 428 > outstanding LM calls 0 > outstanding BIO calls 0 > fh2dentry misses 0 > glocks reclaimed 45784677 > glock nq calls 962822941 > glock dq calls 962595532 > glock prefetch calls 20215922 > lm_lock calls 40708633 > lm_unlock calls 23410498 > lm callbacks 64156052 > address operations 705464659 > dentry operations 19701522 > export operations 0 > file operations 364990733 > inode operations 98910127 > super operations 440061034 > vm operations 7 > block I/O reads 90394984 > block I/O writes 131199864 > > locks 2916542 > locks held 1476005 > freeze count 0 > incore inodes 1454165 > metadata buffers 12539 > unlinked inodes 100 > quota IDs 0 > incore log buffers 11 > log space used 13.33% > meta header cache entries 9928 > glock dependencies 110 > glocks on reclaim list 0 > log wraps 2393 > outstanding LM calls 25 > outstanding BIO calls 0 > fh2dentry misses 55546 > glocks reclaimed 127341056 > glock nq calls 867427 > glock dq calls 867430 > glock prefetch calls 36679316 > lm_lock calls 110179878 > lm_unlock calls 84588424 > lm callbacks 194863553 > address operations 250891447 > dentry operations 359537343 > export operations 390941288 > file operations 399156716 > inode operations 537830 > super operations 1093798409 > vm operations 774785 > block I/O reads 258044208 > block I/O writes 101585172 > > > > On Tue, Oct 7, 2008 at 1:33 PM, Shawn Hood wrote: >> Problem: >> It seems that IO on one machine in the cluster (not always the same >> machine) will hang and all processes accessing clustered LVs will >> block. Other machines will follow suit shortly thereafter until the >> machine that first exhibited the problem is rebooted (via fence_drac >> manually). No messages in dmesg, syslog, etc. Filesystems recently >> fsckd. >> >> Hardware: >> Dell 1950s (similar except memory -- 3x 16GB RAM, 1x 8GB RAM). >> Running RHEL4 ES U7. Four machines >> Onboard gigabit NICs (Machines use little bandwidth, and all network >> traffic including DLM share NICs) >> QLogic 2462 PCI-Express dual channel FC HBAs >> QLogic SANBox 5200 FC switch >> Apple XRAID which presents as two LUNs (~4.5TB raw aggregate) >> Cisco Catalyst switch >> >> Simple four machine RHEL4 U7 cluster running kernel 2.6.9-78.0.1.ELsmp >> x86_64 with the following packages: >> ccs-1.0.12-1 >> cman-1.0.24-1 >> cman-kernel-smp-2.6.9-55.13.el4_7.1 >> cman-kernheaders-2.6.9-55.13.el4_7.1 >> dlm-kernel-smp-2.6.9-54.11.el4_7.1 >> dlm-kernheaders-2.6.9-54.11.el4_7.1 >> fence-1.32.63-1.el4_7.1 >> GFS-6.1.18-1 >> GFS-kernel-smp-2.6.9-80.9.el4_7.1 >> >> One clustered VG. Striped across two physical volumes, which >> correspond to each side of an Apple XRAID. >> Clustered volume group info: >> --- Volume group --- >> VG Name hq-san >> System ID >> Format lvm2 >> Metadata Areas 2 >> Metadata Sequence No 50 >> VG Access read/write >> VG Status resizable >> Clustered yes >> Shared no >> MAX LV 0 >> Cur LV 3 >> Open LV 3 >> Max PV 0 >> Cur PV 2 >> Act PV 2 >> VG Size 4.55 TB >> PE Size 4.00 MB >> Total PE 1192334 >> Alloc PE / Size 905216 / 3.45 TB >> Free PE / Size 287118 / 1.10 TB >> VG UUID hfeIhf-fzEq-clCf-b26M-cMy3-pphm-B6wmLv >> >> Logical volumes contained with hq-san VG: >> cam_development hq-san -wi-ao 500.00G >> qa hq-san -wi-ao 1.07T >> svn_users hq-san -wi-ao 1.89T >> >> All four machines mount svn_users, two machines mount qa, and one >> mounts cam_development. >> >> /etc/cluster/cluster.conf: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > ipaddr="redacted" login="root" passwd="redacted"/> >> > ipaddr="redacted" login="root" passwd="redacted"/> >> > ipaddr="redacted" login="root" passwd="redacted"/> >> > ipaddr="redacted" login="root" passwd="redacted"/> >> >> >> >> >> >> >> >> >> >> >> -- >> Shawn Hood >> 910.670.1819 m >> > > > > -- > Shawn Hood > 910.670.1819 m > -- Shawn Hood 910.670.1819 m From shawnlhood at gmail.com Mon Oct 13 21:32:54 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Mon, 13 Oct 2008 17:32:54 -0400 Subject: [Linux-cluster] Re: GFS hanging on 3 node RHEL4 cluster In-Reply-To: References: Message-ID: High priorty support request, I mean. On Mon, Oct 13, 2008 at 5:32 PM, Shawn Hood wrote: > As a heads up, I'm about to open a high priority bug on this. It's > crippling us. Also, I meant to say it is a 4 node cluster, not a 3 > node. > > Please let me know if I can provide any more information in addition > to this. I will provide the information from a time series of > gfs_tool counters commands with the support request. > > Shawn > > On Tue, Oct 7, 2008 at 1:40 PM, Shawn Hood wrote: >> More info: >> >> All filesystems mounted using noatime,nodiratime,noquota. >> >> All filesystems report the same data from gfs_tool gettune: >> >> limit1 = 100 >> ilimit1_tries = 3 >> ilimit1_min = 1 >> ilimit2 = 500 >> ilimit2_tries = 10 >> ilimit2_min = 3 >> demote_secs = 300 >> incore_log_blocks = 1024 >> jindex_refresh_secs = 60 >> depend_secs = 60 >> scand_secs = 5 >> recoverd_secs = 60 >> logd_secs = 1 >> quotad_secs = 5 >> inoded_secs = 15 >> glock_purge = 0 >> quota_simul_sync = 64 >> quota_warn_period = 10 >> atime_quantum = 3600 >> quota_quantum = 60 >> quota_scale = 1.0000 (1, 1) >> quota_enforce = 0 >> quota_account = 0 >> new_files_jdata = 0 >> new_files_directio = 0 >> max_atomic_write = 4194304 >> max_readahead = 262144 >> lockdump_size = 131072 >> stall_secs = 600 >> complain_secs = 10 >> reclaim_limit = 5000 >> entries_per_readdir = 32 >> prefetch_secs = 10 >> statfs_slots = 64 >> max_mhc = 10000 >> greedy_default = 100 >> greedy_quantum = 25 >> greedy_max = 250 >> rgrp_try_threshold = 100 >> statfs_fast = 0 >> seq_readahead = 0 >> >> >> And data on the FS from gfs_tool counters: >> locks 2948 >> locks held 1352 >> freeze count 0 >> incore inodes 1347 >> metadata buffers 0 >> unlinked inodes 0 >> quota IDs 0 >> incore log buffers 0 >> log space used 0.05% >> meta header cache entries 0 >> glock dependencies 0 >> glocks on reclaim list 0 >> log wraps 2 >> outstanding LM calls 0 >> outstanding BIO calls 0 >> fh2dentry misses 0 >> glocks reclaimed 223287 >> glock nq calls 1812286 >> glock dq calls 1810926 >> glock prefetch calls 101158 >> lm_lock calls 198294 >> lm_unlock calls 142643 >> lm callbacks 341621 >> address operations 502691 >> dentry operations 395330 >> export operations 0 >> file operations 199243 >> inode operations 984276 >> super operations 1727082 >> vm operations 0 >> block I/O reads 520531 >> block I/O writes 130315 >> >> locks 171423 >> locks held 85717 >> freeze count 0 >> incore inodes 85376 >> metadata buffers 1474 >> unlinked inodes 0 >> quota IDs 0 >> incore log buffers 24 >> log space used 0.83% >> meta header cache entries 6621 >> glock dependencies 2037 >> glocks on reclaim list 0 >> log wraps 428 >> outstanding LM calls 0 >> outstanding BIO calls 0 >> fh2dentry misses 0 >> glocks reclaimed 45784677 >> glock nq calls 962822941 >> glock dq calls 962595532 >> glock prefetch calls 20215922 >> lm_lock calls 40708633 >> lm_unlock calls 23410498 >> lm callbacks 64156052 >> address operations 705464659 >> dentry operations 19701522 >> export operations 0 >> file operations 364990733 >> inode operations 98910127 >> super operations 440061034 >> vm operations 7 >> block I/O reads 90394984 >> block I/O writes 131199864 >> >> locks 2916542 >> locks held 1476005 >> freeze count 0 >> incore inodes 1454165 >> metadata buffers 12539 >> unlinked inodes 100 >> quota IDs 0 >> incore log buffers 11 >> log space used 13.33% >> meta header cache entries 9928 >> glock dependencies 110 >> glocks on reclaim list 0 >> log wraps 2393 >> outstanding LM calls 25 >> outstanding BIO calls 0 >> fh2dentry misses 55546 >> glocks reclaimed 127341056 >> glock nq calls 867427 >> glock dq calls 867430 >> glock prefetch calls 36679316 >> lm_lock calls 110179878 >> lm_unlock calls 84588424 >> lm callbacks 194863553 >> address operations 250891447 >> dentry operations 359537343 >> export operations 390941288 >> file operations 399156716 >> inode operations 537830 >> super operations 1093798409 >> vm operations 774785 >> block I/O reads 258044208 >> block I/O writes 101585172 >> >> >> >> On Tue, Oct 7, 2008 at 1:33 PM, Shawn Hood wrote: >>> Problem: >>> It seems that IO on one machine in the cluster (not always the same >>> machine) will hang and all processes accessing clustered LVs will >>> block. Other machines will follow suit shortly thereafter until the >>> machine that first exhibited the problem is rebooted (via fence_drac >>> manually). No messages in dmesg, syslog, etc. Filesystems recently >>> fsckd. >>> >>> Hardware: >>> Dell 1950s (similar except memory -- 3x 16GB RAM, 1x 8GB RAM). >>> Running RHEL4 ES U7. Four machines >>> Onboard gigabit NICs (Machines use little bandwidth, and all network >>> traffic including DLM share NICs) >>> QLogic 2462 PCI-Express dual channel FC HBAs >>> QLogic SANBox 5200 FC switch >>> Apple XRAID which presents as two LUNs (~4.5TB raw aggregate) >>> Cisco Catalyst switch >>> >>> Simple four machine RHEL4 U7 cluster running kernel 2.6.9-78.0.1.ELsmp >>> x86_64 with the following packages: >>> ccs-1.0.12-1 >>> cman-1.0.24-1 >>> cman-kernel-smp-2.6.9-55.13.el4_7.1 >>> cman-kernheaders-2.6.9-55.13.el4_7.1 >>> dlm-kernel-smp-2.6.9-54.11.el4_7.1 >>> dlm-kernheaders-2.6.9-54.11.el4_7.1 >>> fence-1.32.63-1.el4_7.1 >>> GFS-6.1.18-1 >>> GFS-kernel-smp-2.6.9-80.9.el4_7.1 >>> >>> One clustered VG. Striped across two physical volumes, which >>> correspond to each side of an Apple XRAID. >>> Clustered volume group info: >>> --- Volume group --- >>> VG Name hq-san >>> System ID >>> Format lvm2 >>> Metadata Areas 2 >>> Metadata Sequence No 50 >>> VG Access read/write >>> VG Status resizable >>> Clustered yes >>> Shared no >>> MAX LV 0 >>> Cur LV 3 >>> Open LV 3 >>> Max PV 0 >>> Cur PV 2 >>> Act PV 2 >>> VG Size 4.55 TB >>> PE Size 4.00 MB >>> Total PE 1192334 >>> Alloc PE / Size 905216 / 3.45 TB >>> Free PE / Size 287118 / 1.10 TB >>> VG UUID hfeIhf-fzEq-clCf-b26M-cMy3-pphm-B6wmLv >>> >>> Logical volumes contained with hq-san VG: >>> cam_development hq-san -wi-ao 500.00G >>> qa hq-san -wi-ao 1.07T >>> svn_users hq-san -wi-ao 1.89T >>> >>> All four machines mount svn_users, two machines mount qa, and one >>> mounts cam_development. >>> >>> /etc/cluster/cluster.conf: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> ipaddr="redacted" login="root" passwd="redacted"/> >>> >> ipaddr="redacted" login="root" passwd="redacted"/> >>> >> ipaddr="redacted" login="root" passwd="redacted"/> >>> >> ipaddr="redacted" login="root" passwd="redacted"/> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Shawn Hood >>> 910.670.1819 m >>> >> >> >> >> -- >> Shawn Hood >> 910.670.1819 m >> > > > > -- > Shawn Hood > 910.670.1819 m > -- Shawn Hood 910.670.1819 m From jason.huddleston at verizon.com Mon Oct 13 21:38:15 2008 From: jason.huddleston at verizon.com (Jason Huddleston) Date: Mon, 13 Oct 2008 16:38:15 -0500 Subject: [Linux-cluster] Re: GFS hanging on 3 node RHEL4 cluster In-Reply-To: References: Message-ID: <48F3BFC7.9090307@verizon.com> Shawn, Looking at the output below you may want to try and increase statfs_slots to 256. Also, if you have any disk monitoring utilities that monitor drive usage you may want to set statfs_fast equal to 1. --- Jay Shawn Hood wrote: > High priorty support request, I mean. > > On Mon, Oct 13, 2008 at 5:32 PM, Shawn Hood wrote: > >> As a heads up, I'm about to open a high priority bug on this. It's >> crippling us. Also, I meant to say it is a 4 node cluster, not a 3 >> node. >> >> Please let me know if I can provide any more information in addition >> to this. I will provide the information from a time series of >> gfs_tool counters commands with the support request. >> >> Shawn >> >> On Tue, Oct 7, 2008 at 1:40 PM, Shawn Hood wrote: >> >>> More info: >>> >>> All filesystems mounted using noatime,nodiratime,noquota. >>> >>> All filesystems report the same data from gfs_tool gettune: >>> >>> limit1 = 100 >>> ilimit1_tries = 3 >>> ilimit1_min = 1 >>> ilimit2 = 500 >>> ilimit2_tries = 10 >>> ilimit2_min = 3 >>> demote_secs = 300 >>> incore_log_blocks = 1024 >>> jindex_refresh_secs = 60 >>> depend_secs = 60 >>> scand_secs = 5 >>> recoverd_secs = 60 >>> logd_secs = 1 >>> quotad_secs = 5 >>> inoded_secs = 15 >>> glock_purge = 0 >>> quota_simul_sync = 64 >>> quota_warn_period = 10 >>> atime_quantum = 3600 >>> quota_quantum = 60 >>> quota_scale = 1.0000 (1, 1) >>> quota_enforce = 0 >>> quota_account = 0 >>> new_files_jdata = 0 >>> new_files_directio = 0 >>> max_atomic_write = 4194304 >>> max_readahead = 262144 >>> lockdump_size = 131072 >>> stall_secs = 600 >>> complain_secs = 10 >>> reclaim_limit = 5000 >>> entries_per_readdir = 32 >>> prefetch_secs = 10 >>> statfs_slots = 64 >>> max_mhc = 10000 >>> greedy_default = 100 >>> greedy_quantum = 25 >>> greedy_max = 250 >>> rgrp_try_threshold = 100 >>> statfs_fast = 0 >>> seq_readahead = 0 >>> >>> >>> And data on the FS from gfs_tool counters: >>> locks 2948 >>> locks held 1352 >>> freeze count 0 >>> incore inodes 1347 >>> metadata buffers 0 >>> unlinked inodes 0 >>> quota IDs 0 >>> incore log buffers 0 >>> log space used 0.05% >>> meta header cache entries 0 >>> glock dependencies 0 >>> glocks on reclaim list 0 >>> log wraps 2 >>> outstanding LM calls 0 >>> outstanding BIO calls 0 >>> fh2dentry misses 0 >>> glocks reclaimed 223287 >>> glock nq calls 1812286 >>> glock dq calls 1810926 >>> glock prefetch calls 101158 >>> lm_lock calls 198294 >>> lm_unlock calls 142643 >>> lm callbacks 341621 >>> address operations 502691 >>> dentry operations 395330 >>> export operations 0 >>> file operations 199243 >>> inode operations 984276 >>> super operations 1727082 >>> vm operations 0 >>> block I/O reads 520531 >>> block I/O writes 130315 >>> >>> locks 171423 >>> locks held 85717 >>> freeze count 0 >>> incore inodes 85376 >>> metadata buffers 1474 >>> unlinked inodes 0 >>> quota IDs 0 >>> incore log buffers 24 >>> log space used 0.83% >>> meta header cache entries 6621 >>> glock dependencies 2037 >>> glocks on reclaim list 0 >>> log wraps 428 >>> outstanding LM calls 0 >>> outstanding BIO calls 0 >>> fh2dentry misses 0 >>> glocks reclaimed 45784677 >>> glock nq calls 962822941 >>> glock dq calls 962595532 >>> glock prefetch calls 20215922 >>> lm_lock calls 40708633 >>> lm_unlock calls 23410498 >>> lm callbacks 64156052 >>> address operations 705464659 >>> dentry operations 19701522 >>> export operations 0 >>> file operations 364990733 >>> inode operations 98910127 >>> super operations 440061034 >>> vm operations 7 >>> block I/O reads 90394984 >>> block I/O writes 131199864 >>> >>> locks 2916542 >>> locks held 1476005 >>> freeze count 0 >>> incore inodes 1454165 >>> metadata buffers 12539 >>> unlinked inodes 100 >>> quota IDs 0 >>> incore log buffers 11 >>> log space used 13.33% >>> meta header cache entries 9928 >>> glock dependencies 110 >>> glocks on reclaim list 0 >>> log wraps 2393 >>> outstanding LM calls 25 >>> outstanding BIO calls 0 >>> fh2dentry misses 55546 >>> glocks reclaimed 127341056 >>> glock nq calls 867427 >>> glock dq calls 867430 >>> glock prefetch calls 36679316 >>> lm_lock calls 110179878 >>> lm_unlock calls 84588424 >>> lm callbacks 194863553 >>> address operations 250891447 >>> dentry operations 359537343 >>> export operations 390941288 >>> file operations 399156716 >>> inode operations 537830 >>> super operations 1093798409 >>> vm operations 774785 >>> block I/O reads 258044208 >>> block I/O writes 101585172 >>> >>> >>> >>> On Tue, Oct 7, 2008 at 1:33 PM, Shawn Hood wrote: >>> >>>> Problem: >>>> It seems that IO on one machine in the cluster (not always the same >>>> machine) will hang and all processes accessing clustered LVs will >>>> block. Other machines will follow suit shortly thereafter until the >>>> machine that first exhibited the problem is rebooted (via fence_drac >>>> manually). No messages in dmesg, syslog, etc. Filesystems recently >>>> fsckd. >>>> >>>> Hardware: >>>> Dell 1950s (similar except memory -- 3x 16GB RAM, 1x 8GB RAM). >>>> Running RHEL4 ES U7. Four machines >>>> Onboard gigabit NICs (Machines use little bandwidth, and all network >>>> traffic including DLM share NICs) >>>> QLogic 2462 PCI-Express dual channel FC HBAs >>>> QLogic SANBox 5200 FC switch >>>> Apple XRAID which presents as two LUNs (~4.5TB raw aggregate) >>>> Cisco Catalyst switch >>>> >>>> Simple four machine RHEL4 U7 cluster running kernel 2.6.9-78.0.1.ELsmp >>>> x86_64 with the following packages: >>>> ccs-1.0.12-1 >>>> cman-1.0.24-1 >>>> cman-kernel-smp-2.6.9-55.13.el4_7.1 >>>> cman-kernheaders-2.6.9-55.13.el4_7.1 >>>> dlm-kernel-smp-2.6.9-54.11.el4_7.1 >>>> dlm-kernheaders-2.6.9-54.11.el4_7.1 >>>> fence-1.32.63-1.el4_7.1 >>>> GFS-6.1.18-1 >>>> GFS-kernel-smp-2.6.9-80.9.el4_7.1 >>>> >>>> One clustered VG. Striped across two physical volumes, which >>>> correspond to each side of an Apple XRAID. >>>> Clustered volume group info: >>>> --- Volume group --- >>>> VG Name hq-san >>>> System ID >>>> Format lvm2 >>>> Metadata Areas 2 >>>> Metadata Sequence No 50 >>>> VG Access read/write >>>> VG Status resizable >>>> Clustered yes >>>> Shared no >>>> MAX LV 0 >>>> Cur LV 3 >>>> Open LV 3 >>>> Max PV 0 >>>> Cur PV 2 >>>> Act PV 2 >>>> VG Size 4.55 TB >>>> PE Size 4.00 MB >>>> Total PE 1192334 >>>> Alloc PE / Size 905216 / 3.45 TB >>>> Free PE / Size 287118 / 1.10 TB >>>> VG UUID hfeIhf-fzEq-clCf-b26M-cMy3-pphm-B6wmLv >>>> >>>> Logical volumes contained with hq-san VG: >>>> cam_development hq-san -wi-ao 500.00G >>>> qa hq-san -wi-ao 1.07T >>>> svn_users hq-san -wi-ao 1.89T >>>> >>>> All four machines mount svn_users, two machines mount qa, and one >>>> mounts cam_development. >>>> >>>> /etc/cluster/cluster.conf: >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> ipaddr="redacted" login="root" passwd="redacted"/> >>>> >>> ipaddr="redacted" login="root" passwd="redacted"/> >>>> >>> ipaddr="redacted" login="root" passwd="redacted"/> >>>> >>> ipaddr="redacted" login="root" passwd="redacted"/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Shawn Hood >>>> 910.670.1819 m >>>> >>>> >>> >>> -- >>> Shawn Hood >>> 910.670.1819 m >>> >>> >> >> -- >> Shawn Hood >> 910.670.1819 m >> >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pradhanparas at gmail.com Mon Oct 13 22:19:57 2008 From: pradhanparas at gmail.com (Paras pradhan) Date: Mon, 13 Oct 2008 17:19:57 -0500 Subject: [Linux-cluster] debuggin Message-ID: <8b711df40810131519i56cd4f60k794d84587965cabe@mail.gmail.com> My ha.cf entry looks like: node1: logfacility local0 keepalive 2 udpport 694 deadtime 15 warntime 5 initdead 60 ucast eth0 10.42.40.198 ucast eth0 10.42.40.26 auto_failback off stonith_host * suicide ha1.domain.local watchdog /dev/watchdog debugfile /var/log/ha-debug node ha1.domain.local node ha2.domain.local node2: logfacility local0 keepalive 2 udpport 694 deadtime 15 warntime 5 initdead 60 ucast eth0 10.42.40.198 ucast eth0 10.42.40.26 auto_failback off stonith_host * suicide ha2.domain.local watchdog /dev/watchdog debugfile /var/log/ha-debug node ha1.domain.local node ha2.domain.local What does the below log file on node2 means when I turn off the eth0 on node1. Oct 13 17:09:25 ha2 heartbeat: [6841]: WARN: node ha1.domain.local: is dead Oct 13 17:09:25 ha2 heartbeat: [6841]: info: Link ha1.domain.local:eth0 dead. Oct 13 17:09:25 ha2 heartbeat: [6980]: info: Resetting node ha1.domain.local with [Suicide STONITH device] Oct 13 17:09:25 ha2 heartbeat: [6980]: ERROR: glib: ha2.domain.local doesn't control host [ha1.domain.local] Oct 13 17:09:25 ha2 heartbeat: [6980]: ERROR: Host ha1.domain.local not reset! Oct 13 17:09:25 ha2 heartbeat: [6841]: WARN: Managed STONITH ha1.domain.local process 6980 exited with return code 1. Oct 13 17:09:25 ha2 heartbeat: [6841]: ERROR: STONITH of ha1.domain.local failed. Retrying... Oct 13 17:09:30 ha2 heartbeat: [6981]: info: Resetting node ha1.domain.local with [Suicide STONITH device] Oct 13 17:09:30 ha2 heartbeat: [6981]: ERROR: glib: ha2.domain.local doesn't control host [ha1.domain.local] Oct 13 17:09:30 ha2 heartbeat: [6981]: ERROR: Host ha1.domain.local not reset! I need node1 to be shutdown when eth0 on node1 is down. Any help will be greatly appreciated. Paras. -------------- next part -------------- An HTML attachment was scrubbed... URL: From beekhof at gmail.com Tue Oct 14 10:00:16 2008 From: beekhof at gmail.com (Andrew Beekhof) Date: Tue, 14 Oct 2008 12:00:16 +0200 Subject: [Linux-cluster] debuggin In-Reply-To: <8b711df40810131519i56cd4f60k794d84587965cabe@mail.gmail.com> References: <8b711df40810131519i56cd4f60k794d84587965cabe@mail.gmail.com> Message-ID: <26ef5e70810140300y5ef4b044vd0211e6d90dd5d8d@mail.gmail.com> You;re better off asking about the (old) heartbeat resource manager on the heartbeat mailing list. 2008/10/14 Paras pradhan : > My ha.cf entry looks like: > node1: > > logfacility local0 > keepalive 2 > udpport 694 > deadtime 15 > warntime 5 > initdead 60 > ucast eth0 10.42.40.198 > ucast eth0 10.42.40.26 > auto_failback off > stonith_host * suicide ha1.domain.local > watchdog /dev/watchdog > debugfile /var/log/ha-debug > node ha1.domain.local > node ha2.domain.local > > node2: > logfacility local0 > keepalive 2 > udpport 694 > deadtime 15 > warntime 5 > initdead 60 > ucast eth0 10.42.40.198 > ucast eth0 10.42.40.26 > auto_failback off > stonith_host * suicide ha2.domain.local > watchdog /dev/watchdog > debugfile /var/log/ha-debug > node ha1.domain.local > node ha2.domain.local > What does the below log file on node2 means when I turn off the eth0 on > node1. > Oct 13 17:09:25 ha2 heartbeat: [6841]: WARN: node ha1.domain.local: is dead > Oct 13 17:09:25 ha2 heartbeat: [6841]: info: Link ha1.domain.local:eth0 > dead. > Oct 13 17:09:25 ha2 heartbeat: [6980]: info: Resetting node ha1.domain.local > with [Suicide STONITH device] > Oct 13 17:09:25 ha2 heartbeat: [6980]: ERROR: glib: ha2.domain.local doesn't > control host [ha1.domain.local] > Oct 13 17:09:25 ha2 heartbeat: [6980]: ERROR: Host ha1.domain.local not > reset! > Oct 13 17:09:25 ha2 heartbeat: [6841]: WARN: Managed STONITH > ha1.domain.local process 6980 exited with return code 1. > Oct 13 17:09:25 ha2 heartbeat: [6841]: ERROR: STONITH of ha1.domain.local > failed. Retrying... > Oct 13 17:09:30 ha2 heartbeat: [6981]: info: Resetting node ha1.domain.local > with [Suicide STONITH device] > Oct 13 17:09:30 ha2 heartbeat: [6981]: ERROR: glib: ha2.domain.local doesn't > control host [ha1.domain.local] > Oct 13 17:09:30 ha2 heartbeat: [6981]: ERROR: Host ha1.domain.local not > reset! > > > I need node1 to be shutdown when eth0 on node1 is down. > > > Any help will be greatly appreciated. > > Paras. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From pradhanparas at gmail.com Tue Oct 14 13:48:02 2008 From: pradhanparas at gmail.com (Paras pradhan) Date: Tue, 14 Oct 2008 08:48:02 -0500 Subject: [Linux-cluster] debuggin In-Reply-To: <26ef5e70810140300y5ef4b044vd0211e6d90dd5d8d@mail.gmail.com> References: <8b711df40810131519i56cd4f60k794d84587965cabe@mail.gmail.com> <26ef5e70810140300y5ef4b044vd0211e6d90dd5d8d@mail.gmail.com> Message-ID: <8b711df40810140648y2c39032dq948fefb5442ff76c@mail.gmail.com> ok ! that was a mistake. sorry. Paras. On Tue, Oct 14, 2008 at 5:00 AM, Andrew Beekhof wrote: > You;re better off asking about the (old) heartbeat resource manager on > the heartbeat mailing list. > > 2008/10/14 Paras pradhan : > > My ha.cf entry looks like: > > node1: > > > > logfacility local0 > > keepalive 2 > > udpport 694 > > deadtime 15 > > warntime 5 > > initdead 60 > > ucast eth0 10.42.40.198 > > ucast eth0 10.42.40.26 > > auto_failback off > > stonith_host * suicide ha1.domain.local > > watchdog /dev/watchdog > > debugfile /var/log/ha-debug > > node ha1.domain.local > > node ha2.domain.local > > > > node2: > > logfacility local0 > > keepalive 2 > > udpport 694 > > deadtime 15 > > warntime 5 > > initdead 60 > > ucast eth0 10.42.40.198 > > ucast eth0 10.42.40.26 > > auto_failback off > > stonith_host * suicide ha2.domain.local > > watchdog /dev/watchdog > > debugfile /var/log/ha-debug > > node ha1.domain.local > > node ha2.domain.local > > What does the below log file on node2 means when I turn off the eth0 on > > node1. > > Oct 13 17:09:25 ha2 heartbeat: [6841]: WARN: node ha1.domain.local: is > dead > > Oct 13 17:09:25 ha2 heartbeat: [6841]: info: Link ha1.domain.local:eth0 > > dead. > > Oct 13 17:09:25 ha2 heartbeat: [6980]: info: Resetting node > ha1.domain.local > > with [Suicide STONITH device] > > Oct 13 17:09:25 ha2 heartbeat: [6980]: ERROR: glib: ha2.domain.local > doesn't > > control host [ha1.domain.local] > > Oct 13 17:09:25 ha2 heartbeat: [6980]: ERROR: Host ha1.domain.local not > > reset! > > Oct 13 17:09:25 ha2 heartbeat: [6841]: WARN: Managed STONITH > > ha1.domain.local process 6980 exited with return code 1. > > Oct 13 17:09:25 ha2 heartbeat: [6841]: ERROR: STONITH of ha1.domain.local > > failed. Retrying... > > Oct 13 17:09:30 ha2 heartbeat: [6981]: info: Resetting node > ha1.domain.local > > with [Suicide STONITH device] > > Oct 13 17:09:30 ha2 heartbeat: [6981]: ERROR: glib: ha2.domain.local > doesn't > > control host [ha1.domain.local] > > Oct 13 17:09:30 ha2 heartbeat: [6981]: ERROR: Host ha1.domain.local not > > reset! > > > > > > I need node1 to be shutdown when eth0 on node1 is down. > > > > > > Any help will be greatly appreciated. > > > > Paras. > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstoner at opsource.net Tue Oct 14 15:32:09 2008 From: jstoner at opsource.net (Jeff Stoner) Date: Tue, 14 Oct 2008 16:32:09 +0100 Subject: [Linux-cluster] Fencing quandry Message-ID: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> We had a "that totally sucks" event the other night involving fencing. In short - Red Hat 4.7, 2 node cluster using iLO fencing with HP blade servers: - passive node detemined active node was unresponsive (missed too many heartbeats) - passive node initiates take-over and begins fencing process - fencing agent successfully powers off blade server - fencing agent sits in an endless loop trying to power on the blade, which won't power up - the cluster appears "stalled" at this point because fencing won't complete I was able to complete the failover by swapping out the fencing agent with a shell script that does "exit 0". This allowed the fencing agent to complete so the resource manager could successfully relocate the service. My question becomes: why isn't a successful power off considered sufficient for a take-over of a service? If the power is off, you've guaranteed that all resources are released by that node. By requiring a successful power on (which may never happen due to hardware failure,) the fencing agent becomes a single point of failure in the cluster. The fencing agent should make an attempt to power on a down node but it shouldn't hold up the failover process if that attempt fails. --Jeff Performance Engineer OpSource, Inc. http://www.opsource.net "Your Success is Our Success" From james.hofmeister at hp.com Tue Oct 14 17:39:48 2008 From: james.hofmeister at hp.com (Hofmeister, James (WTEC Linux)) Date: Tue, 14 Oct 2008 17:39:48 +0000 Subject: [Linux-cluster] RE: Fencing quandry In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> References: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> Message-ID: Hello Jeff, I am working with RedHat on a RHEL-5 fencing issue with c-class blades... We have bugzilla 433864 opened for this and my notes state to be resolved in RHEL-5.3. We had a workaround in the RHEL-5 cluster configuration: In the /etc/cluster/cluster.conf *Update version number by 1. *Then edit the fence device section for "each" node for example: change to --> Regards, James Hofmeister Hewlett Packard Linux Solutions Engineer |-----Original Message----- |From: linux-cluster-bounces at redhat.com |[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Stoner |Sent: Tuesday, October 14, 2008 8:32 AM |To: linux clustering |Subject: [Linux-cluster] Fencing quandry | |We had a "that totally sucks" event the other night involving fencing. |In short - Red Hat 4.7, 2 node cluster using iLO fencing with HP blade |servers: | |- passive node detemined active node was unresponsive (missed too many |heartbeats) |- passive node initiates take-over and begins fencing process |- fencing agent successfully powers off blade server |- fencing agent sits in an endless loop trying to power on the |blade, which won't power up |- the cluster appears "stalled" at this point because fencing |won't complete | |I was able to complete the failover by swapping out the |fencing agent with a shell script that does "exit 0". This |allowed the fencing agent to complete so the resource manager |could successfully relocate the service. | |My question becomes: why isn't a successful power off |considered sufficient for a take-over of a service? If the |power is off, you've guaranteed that all resources are |released by that node. By requiring a successful power on |(which may never happen due to hardware failure,) the fencing |agent becomes a single point of failure in the cluster. The |fencing agent should make an attempt to power on a down node |but it shouldn't hold up the failover process if that attempt fails. | | | |--Jeff |Performance Engineer | |OpSource, Inc. |http://www.opsource.net |"Your Success is Our Success" | | |-- |Linux-cluster mailing list |Linux-cluster at redhat.com |https://www.redhat.com/mailman/listinfo/linux-cluster | From jstoner at opsource.net Tue Oct 14 22:43:14 2008 From: jstoner at opsource.net (Jeff Stoner) Date: Tue, 14 Oct 2008 23:43:14 +0100 Subject: [Linux-cluster] RE: Fencing quandry In-Reply-To: References: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> Message-ID: <38A48FA2F0103444906AD22E14F1B5A3082D8867@mailxchg01.corp.opsource.net> Thanks for the response, James. Unfortunately, it doesn't fully answer my question or at least, I'm not following the logic. The bug report would seem to indicate a problem with using the default "reboot" method of the agent. The work around simply replaces the single fence device ('reboot') with 2 fence devices ('off' followed by 'on') in the same fence method. If the server fails to power on, then, according to the FAQ, fencing still fails ("All fence devices within a fence method must succeed in order for the method to succeed"). I'm back to fenced being a SPoF if hardware failures prevent a fenced node from powering on. --Jeff Performance Engineer OpSource, Inc. http://www.opsource.net "Your Success is Our Success" > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > Hofmeister, James (WTEC Linux) > Sent: Tuesday, October 14, 2008 1:40 PM > To: linux clustering > Subject: [Linux-cluster] RE: Fencing quandry > > Hello Jeff, > > I am working with RedHat on a RHEL-5 fencing issue with > c-class blades... We have bugzilla 433864 opened for this > and my notes state to be resolved in RHEL-5.3. > > We had a workaround in the RHEL-5 cluster configuration: > > In the /etc/cluster/cluster.conf > > *Update version number by 1. > *Then edit the fence device section for "each" node for example: > > > > > > > change to --> > > > action="off"/> > action="on"/> > > > > Regards, > James Hofmeister > Hewlett Packard Linux Solutions Engineer > > > > |-----Original Message----- > |From: linux-cluster-bounces at redhat.com > |[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Stoner > |Sent: Tuesday, October 14, 2008 8:32 AM > |To: linux clustering > |Subject: [Linux-cluster] Fencing quandry > | > |We had a "that totally sucks" event the other night > involving fencing. > |In short - Red Hat 4.7, 2 node cluster using iLO fencing > with HP blade > |servers: > | > |- passive node detemined active node was unresponsive > (missed too many > |heartbeats) > |- passive node initiates take-over and begins fencing process > |- fencing agent successfully powers off blade server > |- fencing agent sits in an endless loop trying to power on the > |blade, which won't power up > |- the cluster appears "stalled" at this point because fencing > |won't complete > | > |I was able to complete the failover by swapping out the > |fencing agent with a shell script that does "exit 0". This > |allowed the fencing agent to complete so the resource manager > |could successfully relocate the service. > | > |My question becomes: why isn't a successful power off > |considered sufficient for a take-over of a service? If the > |power is off, you've guaranteed that all resources are > |released by that node. By requiring a successful power on > |(which may never happen due to hardware failure,) the fencing > |agent becomes a single point of failure in the cluster. The > |fencing agent should make an attempt to power on a down node > |but it shouldn't hold up the failover process if that attempt fails. > | > | > | > |--Jeff > |Performance Engineer > | > |OpSource, Inc. > |http://www.opsource.net > |"Your Success is Our Success" > | > | > |-- > |Linux-cluster mailing list > |Linux-cluster at redhat.com > |https://www.redhat.com/mailman/listinfo/linux-cluster > | > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > From andres.mujica at seaq.com.co Tue Oct 14 22:56:48 2008 From: andres.mujica at seaq.com.co (Andres Mauricio Mujica Zalamea) Date: Tue, 14 Oct 2008 17:56:48 -0500 (COT) Subject: [Linux-cluster] 2 phy hosts (domain0) hosting 3 vm (domU) with clustered services between them and san storage Message-ID: <1420.200.1.81.99.1224025008.squirrel@webmail.seaq.com.co> Hi, all I've got this deployment. We've got 2 physical servers that are hosting 3 domUs with clustered services between them. We're presenting one multipath device from the same disks of the SAN to each guest, so i've got /dev/mpath/mpath0 mapped from phy node 1 to guest 1 and /dev/mpath/mpath0 mapped from phy node 2 to guest 2. Using luci i've configured the storage but i'm seeing an odd behaviour, for example the partition table is not inmediately seen by the other guest node from the cluster. For the VG to be seen i need to restart the remote guest. And the worst part is that after formatting with GFS2 if i touch a file in one guest node, i would expect the same file to be seen at the other node, but the truth is not, the file is not seen after a reboot or mount/remount... any ideas? i hope i could explained myself a litle bit... Andr?s Mauricio Mujica Zalamea From vipcert at yahoo.com Wed Oct 15 04:33:19 2008 From: vipcert at yahoo.com (Vipin Sharma) Date: Tue, 14 Oct 2008 21:33:19 -0700 (PDT) Subject: [Linux-cluster] lock issue with gfs and gfs2 Message-ID: <358177.12602.qm@web56701.mail.re3.yahoo.com> Hi, Let me try to explain my issue with gfs/gfs2 filesystem. I have two node cluster and a shared gfs filesystem which is mounted on both the nodes at the same time. I can access the filesystem from both nodes. 1. From node A I put a lock on a file called testfile and tried to put the lock on testfile from node B. I get message, file is already locked, which is good since file is locked from ndoe A. 2. Now unmount the filesystem on node B while lock is still there on testfile from node A and mount it back. Now try to put lock on the testfile from node B which is locked from node A. Expected result would be not to succeed in puting lock from node B, but "NO" I am able to put the lock from node B. 3. Node B does not know that there is some lock on testfile form node A but now if you release the lock from node A and put it again and then try to put lock on testfile from node B it works as expected means you will not be able to put lock on testfile. It says file is already locked. It does not make any difference if I use gfs or gfs2 test works same way I tried on Oracle enterprise linux 5.1 and 5.2, which is nothing but redhat. Also node A or node B test results are same. I have lock file which is compiled one but the following program also works the same way. =========================================================================== /* ** lockdemo.c -- shows off your system's file locking. Rated R. */ #include #include #include #include #include int main(int argc, char *argv[]) { /* l_type l_whence l_start l_len l_pid */ struct flock fl = { F_WRLCK, SEEK_SET, 0, 0, 0 }; int fd; fl.l_pid = getpid(); if (argc > 1) fl.l_type = F_RDLCK; if ((fd = open("lockdemo.c", O_RDWR)) == -1) { perror("open"); exit(1); } printf("Press to try to get lock: "); getchar(); printf("Trying to get lock..."); if (fcntl(fd, F_SETLKW, &fl) == -1) { perror("fcntl"); exit(1); } printf("got lock\n"); printf("Press to release lock: "); getchar(); fl.l_type = F_UNLCK; /* set to unlock same region */ if (fcntl(fd, F_SETLK, &fl) == -1) { perror("fcntl"); exit(1); } printf("Unlocked.\n"); close(fd); } ============================================================================= I will be happy to provide more details as requested. TIA vip -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Wed Oct 15 07:15:27 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 15 Oct 2008 08:15:27 +0100 Subject: [Linux-cluster] lock issue with gfs and gfs2 In-Reply-To: <358177.12602.qm@web56701.mail.re3.yahoo.com> References: <358177.12602.qm@web56701.mail.re3.yahoo.com> Message-ID: <1224054927.25004.66.camel@quoit> Hi, Please file a bug at bugzilla.redhat.com, Steve. On Tue, 2008-10-14 at 21:33 -0700, Vipin Sharma wrote: > Hi, > > Let me try to explain my issue with gfs/gfs2 filesystem. I have two > node cluster and a shared gfs filesystem which is mounted on both the > nodes at the same time. I can access the filesystem from both nodes. > 1. From node A I put a lock on a file called testfile and tried to put > the lock on testfile from node B. I get message, file is already > locked, which is good since file is locked from ndoe A. > 2. Now unmount the filesystem on node B while lock is still there on > testfile from node A and mount it back. Now try to put lock on the > testfile from node B which is locked from node A. Expected result > would be not to succeed in puting lock from node B, but "NO" I am able > to put the lock from node B. > 3. Node B does not know that there is some lock on testfile form node > A but now if you release the lock from node A and put it again and > then try to put lock on testfile from node B it works as expected > means you will not be able to put lock on testfile. It says file is > already locked. > > It does not make any difference if I use gfs or gfs2 test works same > way I tried on Oracle enterprise linux 5.1 and 5.2, which is nothing > but redhat. Also node A or node B test results are same. > > I have lock file which is compiled one but the following program also > works the same way. > =========================================================================== > /* > ** lockdemo.c -- shows off your system's file locking. Rated R. > */ > > #include > #include > #include > #include > #include > > int main(int argc, char *argv[]) > { > /* l_type l_whence l_start l_len l_pid */ > struct flock fl = { F_WRLCK, SEEK_SET, 0, 0, 0 }; > int fd; > > fl.l_pid = getpid(); > > if (argc > 1) > fl.l_type = F_RDLCK; > > if ((fd = open("lockdemo.c", O_RDWR)) == -1) { > perror("open"); > exit(1); > } > > printf("Press to try to get lock: "); > getchar(); > printf("Trying to get lock..."); > > if (fcntl(fd, F_SETLKW, &fl) == -1) { > perror("fcntl"); > exit(1); > } > > printf("got lock\n"); > printf("Press to release lock: "); > getchar(); > > fl.l_type = F_UNLCK; /* set to unlock same region */ > > if (fcntl(fd, F_SETLK, &fl) == -1) { > perror("fcntl"); > exit(1); > } > > printf("Unlocked.\n"); > > close(fd); > } > > ============================================================================= > > I will be happy to provide more details as requested. > > TIA > vip > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From erling.nygaard at gmail.com Wed Oct 15 07:53:04 2008 From: erling.nygaard at gmail.com (Erling Nygaard) Date: Wed, 15 Oct 2008 09:53:04 +0200 Subject: [Linux-cluster] RE: Fencing quandry In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A3082D8867@mailxchg01.corp.opsource.net> References: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> <38A48FA2F0103444906AD22E14F1B5A3082D8867@mailxchg01.corp.opsource.net> Message-ID: Jeff If you do not need the fenced node to come back (in your case it can not come back due to the hardware issues) you can remove the "on" fence action and simply have the fence device issue a "off" command. This should return a success. In this case the fenced node will never return to life without human interaction, but that is no worse than the situation you are in now. Erling On Wed, Oct 15, 2008 at 12:43 AM, Jeff Stoner wrote: > Thanks for the response, James. Unfortunately, it doesn't fully answer > my question or at least, I'm not following the logic. The bug report > would seem to indicate a problem with using the default "reboot" method > of the agent. The work around simply replaces the single fence device > ('reboot') with 2 fence devices ('off' followed by 'on') in the same > fence method. If the server fails to power on, then, according to the > FAQ, fencing still fails ("All fence devices within a fence method must > succeed in order for the method to succeed"). > > I'm back to fenced being a SPoF if hardware failures prevent a fenced > node from powering on. > > --Jeff > Performance Engineer > > OpSource, Inc. > http://www.opsource.net > "Your Success is Our Success" > > >> -----Original Message----- >> From: linux-cluster-bounces at redhat.com >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of >> Hofmeister, James (WTEC Linux) >> Sent: Tuesday, October 14, 2008 1:40 PM >> To: linux clustering >> Subject: [Linux-cluster] RE: Fencing quandry >> >> Hello Jeff, >> >> I am working with RedHat on a RHEL-5 fencing issue with >> c-class blades... We have bugzilla 433864 opened for this >> and my notes state to be resolved in RHEL-5.3. >> >> We had a workaround in the RHEL-5 cluster configuration: >> >> In the /etc/cluster/cluster.conf >> >> *Update version number by 1. >> *Then edit the fence device section for "each" node for example: >> >> >> >> >> >> >> change to --> >> >> >> > action="off"/> >> > action="on"/> >> >> >> >> Regards, >> James Hofmeister >> Hewlett Packard Linux Solutions Engineer >> >> >> >> |-----Original Message----- >> |From: linux-cluster-bounces at redhat.com >> |[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Stoner >> |Sent: Tuesday, October 14, 2008 8:32 AM >> |To: linux clustering >> |Subject: [Linux-cluster] Fencing quandry >> | >> |We had a "that totally sucks" event the other night >> involving fencing. >> |In short - Red Hat 4.7, 2 node cluster using iLO fencing >> with HP blade >> |servers: >> | >> |- passive node detemined active node was unresponsive >> (missed too many >> |heartbeats) >> |- passive node initiates take-over and begins fencing process >> |- fencing agent successfully powers off blade server >> |- fencing agent sits in an endless loop trying to power on the >> |blade, which won't power up >> |- the cluster appears "stalled" at this point because fencing >> |won't complete >> | >> |I was able to complete the failover by swapping out the >> |fencing agent with a shell script that does "exit 0". This >> |allowed the fencing agent to complete so the resource manager >> |could successfully relocate the service. >> | >> |My question becomes: why isn't a successful power off >> |considered sufficient for a take-over of a service? If the >> |power is off, you've guaranteed that all resources are >> |released by that node. By requiring a successful power on >> |(which may never happen due to hardware failure,) the fencing >> |agent becomes a single point of failure in the cluster. The >> |fencing agent should make an attempt to power on a down node >> |but it shouldn't hold up the failover process if that attempt fails. >> | >> | >> | >> |--Jeff >> |Performance Engineer >> | >> |OpSource, Inc. >> |http://www.opsource.net >> |"Your Success is Our Success" >> | >> | >> |-- >> |Linux-cluster mailing list >> |Linux-cluster at redhat.com >> |https://www.redhat.com/mailman/listinfo/linux-cluster >> | >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From virginian at blueyonder.co.uk Wed Oct 15 15:01:45 2008 From: virginian at blueyonder.co.uk (Virginian) Date: Wed, 15 Oct 2008 16:01:45 +0100 Subject: [Linux-cluster] Strange error messages in /var/log/messages Message-ID: Hi all, I am running Centos 5.2 on a two node physical cluster with Xen virtualisation and 4 domains clustered underneath. I am see the following in /var/log/messages on one of the physical nodes: Oct 15 15:53:13 xen2 avahi-daemon[3363]: New relevant interface eth0.IPv4 for mDNS. Oct 15 15:53:13 xen2 avahi-daemon[3363]: Joining mDNS multicast group on interface eth0.IPv4 with address 10.199.10.170. Oct 15 15:53:14 xen2 avahi-daemon[3363]: Network interface enumeration completed. Oct 15 15:53:14 xen2 avahi-daemon[3363]: Registering new address record for fe80::200:ff:fe00:0 on virbr0. Oct 15 15:53:14 xen2 avahi-daemon[3363]: Registering new address record for 192.168.122.1 on virbr0. Oct 15 15:53:14 xen2 avahi-daemon[3363]: Registering new address record for fe80::202:a5ff:fed9:ef74 on eth0. Oct 15 15:53:14 xen2 avahi-daemon[3363]: Registering new address record for 10.199.10.170 on eth0. Oct 15 15:53:14 xen2 avahi-daemon[3363]: Registering HINFO record with values 'I686'/'LINUX'. Oct 15 15:53:15 xen2 avahi-daemon[3363]: Server startup complete. Host name is xen2.local. Local service cookie is 3231388299. Oct 15 15:53:16 xen2 avahi-daemon[3363]: Service "SFTP File Transfer on xen2" (/services/sftp-ssh.service) successfully established. Oct 15 15:53:23 xen2 xenstored: Checking store ... Oct 15 15:53:23 xen2 xenstored: Checking store complete. Oct 15 15:53:24 xen2 ccsd[2806]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 15:53:24 xen2 ccsd[2806]: Error while processing get: Invalid argument Oct 15 15:53:24 xen2 ccsd[2806]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 15:53:24 xen2 ccsd[2806]: Error while processing get: Invalid argument Oct 15 15:53:24 xen2 ccsd[2806]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 15:53:24 xen2 ccsd[2806]: Error while processing get: Invalid argument Oct 15 15:53:24 xen2 ccsd[2806]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 15:53:24 xen2 ccsd[2806]: Error while processing get: Invalid argument Oct 15 15:53:24 xen2 ccsd[2806]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 15:53:24 xen2 ccsd[2806]: Error while processing get: Invalid argument Oct 15 15:53:24 xen2 ccsd[2806]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 15:53:24 xen2 ccsd[2806]: Error while processing get: Invalid argument Oct 15 15:53:24 xen2 modclusterd: startup succeeded Oct 15 15:53:24 xen2 clurgmgrd[3531]: Resource Group Manager Starting Oct 15 15:53:25 xen2 oddjobd: oddjobd startup succeeded Oct 15 15:53:26 xen2 saslauthd[3885]: detach_tty : master pid is: 3885 Oct 15 15:53:26 xen2 saslauthd[3885]: ipc_init : listening on socket: /var/run/saslauthd/mux Oct 15 15:53:26 xen2 ricci: startup succeeded Oct 15 15:53:39 xen2 clurgmgrd[3531]: Starting stopped service vm:hermes Oct 15 15:53:39 xen2 clurgmgrd[3531]: Starting stopped service vm:hestia Oct 15 15:53:43 xen2 kernel: tap tap-1-51712: 2 getting info Oct 15 15:53:44 xen2 kernel: tap tap-1-51728: 2 getting info Oct 15 15:53:45 xen2 kernel: device vif1.0 entered promiscuous mode Oct 15 15:53:45 xen2 kernel: ADDRCONF(NETDEV_UP): vif1.0: link is not ready Oct 15 15:53:47 xen2 kernel: tap tap-2-51712: 2 getting info Oct 15 15:53:48 xen2 kernel: tap tap-2-51728: 2 getting info Oct 15 15:53:48 xen2 kernel: device vif2.0 entered promiscuous mode Oct 15 15:53:48 xen2 kernel: ADDRCONF(NETDEV_UP): vif2.0: link is not ready Oct 15 15:53:49 xen2 clurgmgrd[3531]: Service vm:hestia started Oct 15 15:53:49 xen2 clurgmgrd[3531]: Service vm:hermes started Oct 15 15:53:53 xen2 kernel: blktap: ring-ref 8, event-channel 11, protocol 1 (x86_32-abi) Oct 15 15:53:53 xen2 kernel: blktap: ring-ref 9, event-channel 12, protocol 1 (x86_32-abi) Oct 15 15:53:53 xen2 kernel: blktap: ring-ref 8, event-channel 11, protocol 1 (x86_32-abi) Oct 15 15:53:53 xen2 kernel: blktap: ring-ref 9, event-channel 12, protocol 1 (x86_32-abi) Oct 15 15:54:23 xen2 kernel: ADDRCONF(NETDEV_CHANGE): vif2.0: link becomes ready Oct 15 15:54:23 xen2 kernel: xenbr0: topology change detected, propagating Oct 15 15:54:23 xen2 kernel: xenbr0: port 4(vif2.0) entering forwarding state Oct 15 15:54:27 xen2 kernel: ADDRCONF(NETDEV_CHANGE): vif1.0: link becomes ready Oct 15 15:54:27 xen2 kernel: xenbr0: topology change detected, propagating Oct 15 15:54:27 xen2 kernel: xenbr0: port 3(vif1.0) entering forwarding state Oct 15 15:56:15 xen2 clurgmgrd[3531]: Resource Groups Locked My cluster.conf is as follows: cluster.conf 100% 1734 1.7KB/s 00:00 [root at xen1 cluster]# cat /etc/cluster/cluster.conf.15102008 Does anybody know what these messages mean? My domain cluster.conf is as follows: Thanks John -------------- next part -------------- An HTML attachment was scrubbed... URL: From jparsons at redhat.com Wed Oct 15 15:42:25 2008 From: jparsons at redhat.com (jim parsons) Date: Wed, 15 Oct 2008 11:42:25 -0400 Subject: [Linux-cluster] Strange error messages in /var/log/messages In-Reply-To: References: Message-ID: <1224085345.3277.2.camel@localhost.localdomain> On Wed, 2008-10-15 at 16:01 +0100, Virginian wrote: > This tag does not need to be in the inner clusters' (dom u cluster) conf file, only the cluster set up on the physical hosts. That might be the problem - easy enough to check! :) -j From jparsons at redhat.com Wed Oct 15 15:48:45 2008 From: jparsons at redhat.com (jim parsons) Date: Wed, 15 Oct 2008 11:48:45 -0400 Subject: [Linux-cluster] Strange error messages in /var/log/messages In-Reply-To: <1224085345.3277.2.camel@localhost.localdomain> References: <1224085345.3277.2.camel@localhost.localdomain> Message-ID: <1224085725.3277.4.camel@localhost.localdomain> On Wed, 2008-10-15 at 11:42 -0400, jim parsons wrote: > On Wed, 2008-10-15 at 16:01 +0100, Virginian wrote: > > > > > This tag does not need to be in the inner clusters' (dom u cluster) conf file, only the cluster set up on the physical hosts. > > That might be the problem - easy enough to check! :) It would be fun to know if the above fixes the issue. Please let me know. -j From teigland at redhat.com Wed Oct 15 17:16:15 2008 From: teigland at redhat.com (David Teigland) Date: Wed, 15 Oct 2008 12:16:15 -0500 Subject: [Linux-cluster] lock issue with gfs and gfs2 In-Reply-To: <358177.12602.qm@web56701.mail.re3.yahoo.com> References: <358177.12602.qm@web56701.mail.re3.yahoo.com> Message-ID: <20081015171615.GD30528@redhat.com> On Tue, Oct 14, 2008 at 09:33:19PM -0700, Vipin Sharma wrote: > Hi, > > Let me try to explain my issue with gfs/gfs2 filesystem. I have two node > cluster and a shared gfs filesystem which is mounted on both the nodes > at the same time. I can access the filesystem from both nodes. 1. From > node A I put a lock on a file called testfile and tried to put the lock > on testfile from node B. I get message, file is already locked, which is > good since file is locked from ndoe A. 2. Now unmount the filesystem on > node B while lock is still there on testfile from node A and mount it > back. Now try to put lock on the testfile from node B which is locked > from node A. Expected result would be not to succeed in puting lock from > node B, but "NO" I am able to put the lock from node B. 3. Node B does > not know that there is some lock on testfile form node A but now if you > release the lock from node A and put it again and then try to put lock > on testfile from node B it works as expected means you will not be able > to put lock on testfile. It says file is already locked. > > It does not make any difference if I use gfs or gfs2 test works same way > I tried on Oracle enterprise linux 5.1 and 5.2, which is nothing but > redhat. Also node A or node B test results are same. gfs_controld uses checkpoints to sync plock state to new nodes; it appears that there's something going wrong with that. After running your simple test, run 'group_tool dump gfs ' from both nodes and include the output in the bz. Thanks, Dave From virginian at blueyonder.co.uk Wed Oct 15 19:10:44 2008 From: virginian at blueyonder.co.uk (Virginian) Date: Wed, 15 Oct 2008 20:10:44 +0100 Subject: [Linux-cluster] Strange error messages in /var/log/messages References: <1224085345.3277.2.camel@localhost.localdomain> <1224085725.3277.4.camel@localhost.localdomain> Message-ID: Hi Jim, I changed the domU cluster config as you suggested then rebooted the whole caboodle but still get the same messages: Oct 15 18:17:23 xen2 ccsd[2818]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 18:17:23 xen2 ccsd[2818]: Error while processing get: Invalid argument Oct 15 18:17:23 xen2 ccsd[2818]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 18:17:23 xen2 ccsd[2818]: Error while processing get: Invalid argument Oct 15 18:17:23 xen2 ccsd[2818]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 18:17:23 xen2 ccsd[2818]: Error while processing get: Invalid argument Oct 15 18:17:23 xen2 ccsd[2818]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 18:17:23 xen2 ccsd[2818]: Error while processing get: Invalid argument Oct 15 18:17:23 xen2 ccsd[2818]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 18:17:23 xen2 ccsd[2818]: Error while processing get: Invalid argument Oct 15 18:17:23 xen2 ccsd[2818]: Error: unable to evaluate xpath query "/cluster/fence_xvmd/@(null) " Oct 15 18:17:23 xen2 ccsd[2818]: Error while processing get: Invalid argument Thanks for the suggestion though and at least the domU configs are now corrected which is a plus. Regards John ----- Original Message ----- From: "jim parsons" To: "linux clustering" Sent: Wednesday, October 15, 2008 4:48 PM Subject: Re: [Linux-cluster] Strange error messages in /var/log/messages > On Wed, 2008-10-15 at 11:42 -0400, jim parsons wrote: >> On Wed, 2008-10-15 at 16:01 +0100, Virginian wrote: >> >> > >> >> This tag does not need to be in the inner clusters' (dom u cluster) conf >> file, only the cluster set up on the physical hosts. >> >> That might be the problem - easy enough to check! :) > > It would be fun to know if the above fixes the issue. Please let me > know. > > -j > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From lhh at redhat.com Wed Oct 15 19:18:09 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 15 Oct 2008 15:18:09 -0400 Subject: [Linux-cluster] Strange error messages in /var/log/messages In-Reply-To: <1224085725.3277.4.camel@localhost.localdomain> References: <1224085345.3277.2.camel@localhost.localdomain> <1224085725.3277.4.camel@localhost.localdomain> Message-ID: <1224098289.5912.5.camel@ayanami> On Wed, 2008-10-15 at 11:48 -0400, jim parsons wrote: > On Wed, 2008-10-15 at 11:42 -0400, jim parsons wrote: > > On Wed, 2008-10-15 at 16:01 +0100, Virginian wrote: > > > > > > > > > This tag does not need to be in the inner clusters' (dom u cluster) conf file, only the cluster set up on the physical hosts. > > > > That might be the problem - easy enough to check! :) > > It would be fun to know if the above fixes the issue. Please let me > know. I think I see it. -- Lon -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_xvmd-ccs.patch Type: text/x-patch Size: 427 bytes Desc: not available URL: From james.hofmeister at hp.com Wed Oct 15 20:45:46 2008 From: james.hofmeister at hp.com (Hofmeister, James (WTEC Linux)) Date: Wed, 15 Oct 2008 20:45:46 +0000 Subject: [Linux-cluster] RE: Fencing quandry In-Reply-To: <38A48FA2F0103444906AD22E14F1B5A3082D8867@mailxchg01.corp.opsource.net> References: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> <38A48FA2F0103444906AD22E14F1B5A3082D8867@mailxchg01.corp.opsource.net> Message-ID: Hello Jeff, RE: [Linux-cluster] RE: Fencing quandary The root issue is the ILO scripts are not up to date with the current firmware rev in the c-class and p-class blades. The method of '' for a "reboot" is not working with this ILO firmware rev and the workaround is to send 2 commands to ILO under a single method... 'action="off"/' and 'action="on"/'. I had tested this with my p-class blades and it was successful. I am still waiting for my customers test results on their c-class blades. ...yes this is the root issue to the ILO problem, but it does not completely address your concern. I believe you are saying: That the RHCS does not accept a "power off" as a fence, but is requiring both "power off" followed by "power on". Regards, James Hofmeister Hewlett Packard Linux Solutions Engineer |-----Original Message----- |From: linux-cluster-bounces at redhat.com |[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Stoner |Sent: Tuesday, October 14, 2008 3:43 PM |To: linux clustering |Subject: RE: [Linux-cluster] RE: Fencing quandry | |Thanks for the response, James. Unfortunately, it doesn't |fully answer my question or at least, I'm not following the |logic. The bug report would seem to indicate a problem with |using the default "reboot" method of the agent. The work |around simply replaces the single fence device |('reboot') with 2 fence devices ('off' followed by 'on') in |the same fence method. If the server fails to power on, then, |according to the FAQ, fencing still fails ("All fence devices |within a fence method must succeed in order for the method to |succeed"). | |I'm back to fenced being a SPoF if hardware failures prevent a |fenced node from powering on. | |--Jeff |Performance Engineer | |OpSource, Inc. |http://www.opsource.net |"Your Success is Our Success" | | |> -----Original Message----- |> From: linux-cluster-bounces at redhat.com |> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Hofmeister, |> James (WTEC Linux) |> Sent: Tuesday, October 14, 2008 1:40 PM |> To: linux clustering |> Subject: [Linux-cluster] RE: Fencing quandry |> |> Hello Jeff, |> |> I am working with RedHat on a RHEL-5 fencing issue with c-class |> blades... We have bugzilla 433864 opened for this and my |notes state |> to be resolved in RHEL-5.3. |> |> We had a workaround in the RHEL-5 cluster configuration: |> |> In the /etc/cluster/cluster.conf |> |> *Update version number by 1. |> *Then edit the fence device section for "each" node for example: |> |> |> |> |> |> |> change to --> |> |> |> action="off"/> |> action="on"/> |> |> |> |> Regards, |> James Hofmeister |> Hewlett Packard Linux Solutions Engineer |> |> |> |> |-----Original Message----- |> |From: linux-cluster-bounces at redhat.com |> |[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Stoner |> |Sent: Tuesday, October 14, 2008 8:32 AM |> |To: linux clustering |> |Subject: [Linux-cluster] Fencing quandry |> | |> |We had a "that totally sucks" event the other night |> involving fencing. |> |In short - Red Hat 4.7, 2 node cluster using iLO fencing |> with HP blade |> |servers: |> | |> |- passive node detemined active node was unresponsive |> (missed too many |> |heartbeats) |> |- passive node initiates take-over and begins fencing process |> |- fencing agent successfully powers off blade server |> |- fencing agent sits in an endless loop trying to power on |the blade, |> |which won't power up |> |- the cluster appears "stalled" at this point because fencing won't |> |complete |> | |> |I was able to complete the failover by swapping out the |fencing agent |> |with a shell script that does "exit 0". This allowed the fencing |> |agent to complete so the resource manager could |successfully relocate |> |the service. |> | |> |My question becomes: why isn't a successful power off considered |> |sufficient for a take-over of a service? If the power is |off, you've |> |guaranteed that all resources are released by that node. By |requiring |> |a successful power on (which may never happen due to hardware |> |failure,) the fencing agent becomes a single point of |failure in the |> |cluster. The fencing agent should make an attempt to power |on a down |> |node but it shouldn't hold up the failover process if that attempt |> |fails. |> | |> | |> | |> |--Jeff |> |Performance Engineer |> | |> |OpSource, Inc. |> |http://www.opsource.net |> |"Your Success is Our Success" |> | |> | |> |-- |> |Linux-cluster mailing list |> |Linux-cluster at redhat.com |> |https://www.redhat.com/mailman/listinfo/linux-cluster |> | |> |> -- |> Linux-cluster mailing list |> Linux-cluster at redhat.com |> https://www.redhat.com/mailman/listinfo/linux-cluster |> |> | |-- |Linux-cluster mailing list |Linux-cluster at redhat.com |https://www.redhat.com/mailman/listinfo/linux-cluster | From jparsons at redhat.com Wed Oct 15 21:38:11 2008 From: jparsons at redhat.com (jim parsons) Date: Wed, 15 Oct 2008 17:38:11 -0400 Subject: [Linux-cluster] RE: Fencing quandry In-Reply-To: References: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> <38A48FA2F0103444906AD22E14F1B5A3082D8867@mailxchg01.corp.opsource.net> Message-ID: <1224106691.3367.19.camel@localhost.localdomain> On Wed, 2008-10-15 at 20:45 +0000, Hofmeister, James (WTEC Linux) wrote: > Hello Jeff, > > RE: [Linux-cluster] RE: Fencing quandary > > The root issue is the ILO scripts are not up to date with the current firmware rev in the c-class and p-class blades. > > The method of '' for a "reboot" is not working with this ILO firmware rev and the workaround is to send 2 commands to ILO under a single method... 'action="off"/' and 'action="on"/'. > > I had tested this with my p-class blades and it was successful. I am still waiting for my customers test results on their c-class blades. > > ...yes this is the root issue to the ILO problem, but it does not completely address your concern. I believe you are saying: That the RHCS does not accept a "power off" as a fence, but is requiring both "power off" followed by "power on". Right. It is failing because the 'power on' portion is not completing because the fence agent is unable to send the correct power on command. With all due respect to HP's iLO, along with DRAC, RSA, RSB, etc, keeping up wee little delta's between firmware versions of baseboard management devices is challenging. Please pull down the very latest version of the agent and try it. For the time being, you could just use the power off command and walk over and turn it back on if it is convenient :). You could also run the agent from the command line with the verbose output switch set (man fence_ilo) and see if you can determine why the command is failing. Post what you find here. The agent is written in Perl and pretty easy to understand I think, if you are adventurous. The upcoming 5.3 ilo agent has been rewritten to include additional connection types, and is being heavily tested now on many firmware versions. The beta is close to release. Grab it when you can. -j > > Regards, > James Hofmeister > Hewlett Packard Linux Solutions Engineer > > |-----Original Message----- > |From: linux-cluster-bounces at redhat.com > |[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Stoner > |Sent: Tuesday, October 14, 2008 3:43 PM > |To: linux clustering > |Subject: RE: [Linux-cluster] RE: Fencing quandry > | > |Thanks for the response, James. Unfortunately, it doesn't > |fully answer my question or at least, I'm not following the > |logic. The bug report would seem to indicate a problem with > |using the default "reboot" method of the agent. The work > |around simply replaces the single fence device > |('reboot') with 2 fence devices ('off' followed by 'on') in > |the same fence method. If the server fails to power on, then, > |according to the FAQ, fencing still fails ("All fence devices > |within a fence method must succeed in order for the method to > |succeed"). > | > |I'm back to fenced being a SPoF if hardware failures prevent a > |fenced node from powering on. > | > |--Jeff > |Performance Engineer > | > |OpSource, Inc. > |http://www.opsource.net > |"Your Success is Our Success" > | > | > |> -----Original Message----- > |> From: linux-cluster-bounces at redhat.com > |> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Hofmeister, > |> James (WTEC Linux) > |> Sent: Tuesday, October 14, 2008 1:40 PM > |> To: linux clustering > |> Subject: [Linux-cluster] RE: Fencing quandry > |> > |> Hello Jeff, > |> > |> I am working with RedHat on a RHEL-5 fencing issue with c-class > |> blades... We have bugzilla 433864 opened for this and my > |notes state > |> to be resolved in RHEL-5.3. > |> > |> We had a workaround in the RHEL-5 cluster configuration: > |> > |> In the /etc/cluster/cluster.conf > |> > |> *Update version number by 1. > |> *Then edit the fence device section for "each" node for example: > |> > |> > |> > |> > |> > |> > |> change to --> > |> > |> > |> |> action="off"/> > |> |> action="on"/> > |> > |> > |> > |> Regards, > |> James Hofmeister > |> Hewlett Packard Linux Solutions Engineer > |> > |> > |> > |> |-----Original Message----- > |> |From: linux-cluster-bounces at redhat.com > |> |[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Stoner > |> |Sent: Tuesday, October 14, 2008 8:32 AM > |> |To: linux clustering > |> |Subject: [Linux-cluster] Fencing quandry > |> | > |> |We had a "that totally sucks" event the other night > |> involving fencing. > |> |In short - Red Hat 4.7, 2 node cluster using iLO fencing > |> with HP blade > |> |servers: > |> | > |> |- passive node detemined active node was unresponsive > |> (missed too many > |> |heartbeats) > |> |- passive node initiates take-over and begins fencing process > |> |- fencing agent successfully powers off blade server > |> |- fencing agent sits in an endless loop trying to power on > |the blade, > |> |which won't power up > |> |- the cluster appears "stalled" at this point because fencing won't > |> |complete > |> | > |> |I was able to complete the failover by swapping out the > |fencing agent > |> |with a shell script that does "exit 0". This allowed the fencing > |> |agent to complete so the resource manager could > |successfully relocate > |> |the service. > |> | > |> |My question becomes: why isn't a successful power off considered > |> |sufficient for a take-over of a service? If the power is > |off, you've > |> |guaranteed that all resources are released by that node. By > |requiring > |> |a successful power on (which may never happen due to hardware > |> |failure,) the fencing agent becomes a single point of > |failure in the > |> |cluster. The fencing agent should make an attempt to power > |on a down > |> |node but it shouldn't hold up the failover process if that attempt > |> |fails. > |> | > |> | > |> | > |> |--Jeff > |> |Performance Engineer > |> | > |> |OpSource, Inc. > |> |http://www.opsource.net > |> |"Your Success is Our Success" > |> | > |> | > |> |-- > |> |Linux-cluster mailing list > |> |Linux-cluster at redhat.com > |> |https://www.redhat.com/mailman/listinfo/linux-cluster > |> | > |> > |> -- > |> Linux-cluster mailing list > |> Linux-cluster at redhat.com > |> https://www.redhat.com/mailman/listinfo/linux-cluster > |> > |> > | > |-- > |Linux-cluster mailing list > |Linux-cluster at redhat.com > |https://www.redhat.com/mailman/listinfo/linux-cluster > | > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From kanderso at redhat.com Wed Oct 15 21:42:23 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Wed, 15 Oct 2008 16:42:23 -0500 Subject: [Linux-cluster] RE: Fencing quandry In-Reply-To: <1224106691.3367.19.camel@localhost.localdomain> References: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> <38A48FA2F0103444906AD22E14F1B5A3082D8867@mailxchg01.corp.opsource.net> <1224106691.3367.19.camel@localhost.localdomain> Message-ID: <1224106943.2991.59.camel@dhcp80-204.msp.redhat.com> On Wed, 2008-10-15 at 17:38 -0400, jim parsons wrote: > On Wed, 2008-10-15 at 20:45 +0000, Hofmeister, James (WTEC Linux) wrote: > > Hello Jeff, > > > > RE: [Linux-cluster] RE: Fencing quandary > > > > The root issue is the ILO scripts are not up to date with the current firmware rev in the c-class and p-class blades. > > > > The method of '' for a "reboot" is not working with this ILO firmware rev and the workaround is to send 2 commands to ILO under a single method... 'action="off"/' and 'action="on"/'. > > > > I had tested this with my p-class blades and it was successful. I am still waiting for my customers test results on their c-class blades. > > > > ...yes this is the root issue to the ILO problem, but it does not completely address your concern. I believe you are saying: That the RHCS does not accept a "power off" as a fence, but is requiring both "power off" followed by "power on". > Right. It is failing because the 'power on' portion is not completing > because the fence agent is unable to send the correct power on command. > But the point is, even if the power on command fails, the fencing agent should report success, since the real need is to ensure the machine is no longer participating in the cluster and not bring it back up. So, is it proper to report success if part of the request fails as long as the critical part succeeds? Kevin From andres.mujica at seaq.com.co Wed Oct 15 21:42:52 2008 From: andres.mujica at seaq.com.co (Andres Mauricio Mujica Zalamea) Date: Wed, 15 Oct 2008 16:42:52 -0500 (COT) Subject: [Linux-cluster] 2 phy hosts (domain0) hosting 3 vm (domU) with clustered services between them and san storage Message-ID: <18247.200.1.81.99.1224106972.squirrel@webmail.seaq.com.co> Hi, i?ve narrowed down the problem to something similar if not the same as posted recently on this list. It seems that the problem lies within the LV creation. My systemis using RHEL 5.1 and when i?ve manually create a LV i?ve received was something like Error locking on node node1 Volume group for uuid not found: 0PfAdiZHlULoLDX3gw3OGFrwsbPS8io1SWPEFXXOI0VzhYJLy8nGpBFdT7Oi25bF Failed to activate new LV. but the LV appearead on the creation node and after a while (several reboots) in the another node. That leads me to think that my gfs2 problem lies there. I?ve upgraded to RHEL 5.2 in order to use the lvm2 and kernel upgraded packages that seemed to solve similar issues. However, the error changed a bit as well as the behaviour at the LV creation. When i execute the lvcreate command i get this error lvcreate -n ems_lv -L +5.99G ems_vg Rounding up size to full physical extent 5.99 GB Error locking on node ems88clu2.bvc.com.co: Error backing up metadata, can't find VG for group #global Aborting. Failed to activate new LV to wipe the start of it. the difference with 5.1 is that previously the lv was created only at the creation node , with 5.2 besides the different error message the LV is NOT created at either both nodes. This is happening inside a guest accessing the san as a virtual block device presented by the domain-0 (the domain-o exports the mpath device). The nodes are in different domain-0 accessing the same SAN Thanks for your help -- Andr?s Mauricio Mujica Zalamea From mockey.chen at nsn.com Thu Oct 16 09:10:51 2008 From: mockey.chen at nsn.com (Chen, Mockey (NSN - CN/Cheng Du)) Date: Thu, 16 Oct 2008 17:10:51 +0800 Subject: [Linux-cluster] Two nodes cluster issue without shared storage issue Message-ID: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net> Hi, I want to set up a two node cluster, I use active/standby mode to run my service. I need even one node's hardware failure such as power cut, another node still can handover from failure node and the provide the service. In my environment, I have no shared storage, so I can not use quorum disk. Is there any other way to implement it? I searched and found 'tiebreaker IP' may feed my request, but I can not found any hints on how to configure it ? Any suggestion ? Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bkyoung at gmail.com Thu Oct 16 14:52:50 2008 From: bkyoung at gmail.com (Brandon Young) Date: Thu, 16 Oct 2008 09:52:50 -0500 Subject: [Linux-cluster] GFS Tunables Message-ID: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> Hi all, I currently have a GFS deployment consisting of eight servers and several GFS volumes. One of my GFS servers is a dedicated backup server with a second replica SAN attached to it through a second HBA. My approach to backups has been with tools such as rsync and rdiff-backup, run on a nightly basis. I am having a particular problem with one or two of my filesystems taking a *very* long time to backup. For example, I have /home living on GFS. Day-to-day performance is acceptable, but backups are hideously slow. Every night, I kick off an rdiff-backup of /home from my backup server, which dumps the backup onto an XFS filesystem on the replica SAN. This backup can take days in some cases. We have done some investigating, and found that it appears that getdents(2) calls (which give the list of filenames present in a directory) are spectacularly slow on GFS, irrespective of the size of the directory in question. In particular, with 'strace -r', I'm seeing a rate below 100 filenames per second. The filesystem /home has at least 10 million files in it, which doing the math means 29.5 hours just to do the getdents calls to scan them, which is more than a third of wall-clock time. And that's before we even start stat'ing. I google'd around a bit and I can't see any discussion of slow getdents calls under GFS. Is there any chance we have some sort of tunable turned on/off that might be causing this? I'm not sure which tunables to consider tweaking, even. This seems awfully slow, even with sub-optimal locking. Is there perhaps some tunable I can try tweaking to improve this situation? Any insights would be much appreciated. -- Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Caetano at hp.com Thu Oct 16 15:05:13 2008 From: Greg.Caetano at hp.com (Caetano, Greg) Date: Thu, 16 Oct 2008 15:05:13 +0000 Subject: [Linux-cluster] RE: Fencing quandry In-Reply-To: <1224106943.2991.59.camel@dhcp80-204.msp.redhat.com> References: <38A48FA2F0103444906AD22E14F1B5A3082D8723@mailxchg01.corp.opsource.net> <38A48FA2F0103444906AD22E14F1B5A3082D8867@mailxchg01.corp.opsource.net> <1224106691.3367.19.camel@localhost.localdomain> <1224106943.2991.59.camel@dhcp80-204.msp.redhat.com> Message-ID: As mentioned the version of the ilo firmware caused some issues for cluster admins because additional features/commands were incorporated. This topic was discussed at the Red Hat Summit and a single command of "COLD_BOOT_SERVER" would perform a power off/wait 4 seconds/cold boot the server. This directive was suggested as a replacement for the "HOLD_PWR_BTN" directive in the scripts Greg Caetano Hewlett-Packard Company ESS Software Platform & Business Enablement Solutions Engineering Chicago, IL greg.caetano at hp.com Red Hat Certified Engineer RHCE#805007310328754 -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kevin Anderson Sent: Wednesday, October 15, 2008 4:42 PM To: linux clustering Subject: RE: [Linux-cluster] RE: Fencing quandry On Wed, 2008-10-15 at 17:38 -0400, jim parsons wrote: > On Wed, 2008-10-15 at 20:45 +0000, Hofmeister, James (WTEC Linux) wrote: > > Hello Jeff, > > > > RE: [Linux-cluster] RE: Fencing quandary > > > > The root issue is the ILO scripts are not up to date with the current firmware rev in the c-class and p-class blades. > > > > The method of '' for a "reboot" is not working with this ILO firmware rev and the workaround is to send 2 commands to ILO under a single method... 'action="off"/' and 'action="on"/'. > > > > I had tested this with my p-class blades and it was successful. I am still waiting for my customers test results on their c-class blades. > > > > ...yes this is the root issue to the ILO problem, but it does not completely address your concern. I believe you are saying: That the RHCS does not accept a "power off" as a fence, but is requiring both "power off" followed by "power on". > Right. It is failing because the 'power on' portion is not completing > because the fence agent is unable to send the correct power on command. > But the point is, even if the power on command fails, the fencing agent should report success, since the real need is to ensure the machine is no longer participating in the cluster and not bring it back up. So, is it proper to report success if part of the request fails as long as the critical part succeeds? Kevin -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From shawnlhood at gmail.com Thu Oct 16 15:29:23 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Thu, 16 Oct 2008 11:29:23 -0400 Subject: [Linux-cluster] fencing problem Message-ID: All, I'll provide some more config details a little later, but thought maybe some cursory information could yield a response. Simple four node cluster running RHEL4U7, latest RHEL cluster packages. Three GFS filesystems. This morning one of our nodes remained responsive, but was having some problems that required a reboot. Unfortunately, most commands from the command line were unsuccessful (Input/Output error, seems the root filesystem may have been remounted read only). I decided to fence the node from another node in the cluster -- using fence_node . This calls fence_drac. The operation returned successful, the node was fenced and rebooted. After this fencing operation, all nodes reporting their Membership state (as reported by cman_tool status) as Transition-Master. Per http://sources.redhat.com/cluster/faq.html#gfs_fencefreeze, I understand that GFS will freeze briefly after fencing is performed. The filesystems did not return to a responsive state. After many transition restarts, all nodes leave the cluster (as expected). Some logs and cluster.conf below. Shawn Oct 16 10:09:12 hugin fence_node[3512]: Fence of "munin" was successful Oct 16 10:09:32 hugin kernel: CMAN: removing node munin from the cluster : Missed too many heartbeats Oct 16 10:09:32 hugin kernel: CMAN: Initiating transition, generation 69 Oct 16 10:09:47 hugin kernel: CMAN: Initiating transition, generation 70 Oct 16 10:10:02 hugin kernel: CMAN: Initiating transition, generation 71 Oct 16 10:10:17 hugin kernel: CMAN: Initiating transition, generation 72 Oct 16 10:10:32 hugin kernel: CMAN: Initiating transition, generation 73 Oct 16 10:10:47 hugin kernel: CMAN: Initiating transition, generation 74 Oct 16 10:11:02 hugin kernel: CMAN: Initiating transition, generation 75 Oct 16 10:11:17 hugin kernel: CMAN: Initiating transition, generation 76 Oct 16 10:11:32 hugin kernel: CMAN: Initiating transition, generation 77 Oct 16 10:11:47 hugin kernel: CMAN: Initiating transition, generation 78 Oct 16 10:12:02 hugin kernel: CMAN: Initiating transition, generation 79 Oct 16 10:12:14 hugin kernel: CMAN: removing node odin from the cluster : Inconsistent cluster view Oct 16 10:12:14 hugin kernel: CMAN: Initiating transition, generation 80 Oct 16 10:12:14 hugin kernel: CMAN: removing node odin from the cluster : Inconsistent cluster view Oct 16 10:12:14 hugin kernel: CMAN: Initiating transition, generation 81 Oct 16 10:12:16 hugin kernel: CMAN: removing node zeus from the cluster : Inconsistent cluster view Oct 16 10:12:16 hugin kernel: CMAN: quorum lost, blocking activity Oct 16 10:12:16 hugin clurgmgrd[8799]: #1: Quorum Dissolved Oct 16 10:12:16 hugin kernel: CMAN: removing node zeus from the cluster : Inconsistent cluster view Oct 16 10:12:19 hugin ccsd[6330]: Cluster is not quorate. Refusing connection. Oct 16 10:12:19 hugin ccsd[6330]: Error while processing connect: Connection refused Oct 16 10:12:29 hugin ccsd[6330]: Cluster is not quorate. Refusing connection. Oct 16 10:12:29 hugin ccsd[6330]: Error while processing connect: Connection refused Oct 16 10:12:39 hugin ccsd[6330]: Cluster is not quorate. Refusing connection. Oct 16 10:13:47 hugin kernel: CMAN: node munin rejoining Oct 16 10:13:47 hugin kernel: CMAN: Completed transition, generation 81 Oct 16 10:13:49 hugin ccsd[6330]: Cluster is not quorate. Refusing connection. Oct 16 10:13:49 hugin ccsd[6330]: Error while processing connect: Connection refused -- previous error message repeated several times --- Another node in the same cluster, after fencing munin from hugin: Oct 16 10:09:31 zeus kernel: CMAN: removing node munin from the cluster : Missed too many heartbeats Oct 16 10:09:31 zeus kernel: CMAN: Initiating transition, generation 69 Oct 16 10:09:46 zeus kernel: CMAN: Initiating transition, generation 70 Oct 16 10:10:01 zeus kernel: CMAN: Initiating transition, generation 71 cluster.conf: /> /> /> /> From kanderso at redhat.com Thu Oct 16 15:40:23 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Thu, 16 Oct 2008 10:40:23 -0500 Subject: [Linux-cluster] fencing problem In-Reply-To: References: Message-ID: <1224171623.2982.14.camel@dhcp80-204.msp.redhat.com> Shawn, Not sure about your problem, but there is an issue with your cluster.conf file. You should remove this line: Since you have more than 2 nodes in your cluster. Looks like an artifact from running a two node cluster and then upgrading. When turning two_node off, you need to also remove the expected_votes setting as well. Kevin From shawnlhood at gmail.com Thu Oct 16 15:42:59 2008 From: shawnlhood at gmail.com (Shawn Hood) Date: Thu, 16 Oct 2008 11:42:59 -0400 Subject: [Linux-cluster] fencing problem In-Reply-To: <1224171623.2982.14.camel@dhcp80-204.msp.redhat.com> References: <1224171623.2982.14.camel@dhcp80-204.msp.redhat.com> Message-ID: It is indeed an artifact of times past. Thanks for pointing this out! Shawn On Thu, Oct 16, 2008 at 11:40 AM, Kevin Anderson wrote: > Shawn, > > Not sure about your problem, but there is an issue with your > cluster.conf file. You should remove this line: > > > > Since you have more than 2 nodes in your cluster. Looks like an > artifact from running a two node cluster and then upgrading. When > turning two_node off, you need to also remove the expected_votes setting > as well. > > Kevin > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Shawn Hood 910.670.1819 m From s.wendy.cheng at gmail.com Thu Oct 16 15:50:53 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 16 Oct 2008 10:50:53 -0500 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> Message-ID: <48F762DD.8040102@gmail.com> Brandon Young wrote: > Hi all, > > I currently have a GFS deployment consisting of eight servers and > several GFS volumes. One of my GFS servers is a dedicated backup > server with a second replica SAN attached to it through a second HBA. > My approach to backups has been with tools such as rsync and > rdiff-backup, run on a nightly basis. I am having a particular > problem with one or two of my filesystems taking a *very* long time to > backup. For example, I have /home living on GFS. Day-to-day > performance is acceptable, but backups are hideously slow. Every > night, I kick off an rdiff-backup of /home from my backup server, > which dumps the backup onto an XFS filesystem on the replica SAN. > This backup can take days in some cases. Not only GFS, the "getdents()" has been more than annoying on many filesystems if entries count within the directory is high - but, yes, GFS is particularly bloody slow with its directory read. There have been efforts contributed by Red Hat POSIX and LIBC folks to have new standardized light-weight directory operations. Unfortunately I lost tracks of their progress ... On the other hand, integrating these new calls into GFS would take time anyway (if they are available) - so unlikely it can meet your need. There were also few experimental GFS patches but none of them made into the production code. Unless other GFS folks can give you more ideas, I think your best bet at this moment is to think "outside" the box. That is, don't do file-to-file backup if all possible. Check out other block level backup strategies. Are Linux LVM mirroring and/or snapshots workable for you ? Does your SAN vendor provide embedded features (e.g. Netapp SAN box offers snapshot, snapmirror, syncmirror, etc) ? -- Wendy > > We have done some investigating, and found that it appears that > getdents(2) calls (which give the list of filenames present in a > directory) are spectacularly slow on GFS, irrespective of the size of > the directory in question. In particular, with 'strace -r', I'm > seeing a rate below 100 filenames per second. The filesystem /home > has at least 10 million files in it, which doing the math means 29.5 > hours just to do the getdents calls to scan them, which is more than a > third of wall-clock time. And that's before we even start stat'ing. > > I google'd around a bit and I can't see any discussion of slow > getdents calls under GFS. Is there any chance we have some sort of > tunable turned on/off that might be causing this? I'm not sure which > tunables to consider tweaking, even. This seems awfully slow, even > with sub-optimal locking. Is there perhaps some tunable I can try > tweaking to improve this situation? Any insights would be much > appreciated. > > -- > Brandon > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From jos at xos.nl Thu Oct 16 16:30:56 2008 From: jos at xos.nl (Jos Vos) Date: Thu, 16 Oct 2008 18:30:56 +0200 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <48F762DD.8040102@gmail.com> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> Message-ID: <20081016163056.GA14934@jasmine.xos.nl> On Thu, Oct 16, 2008 at 10:50:53AM -0500, Wendy Cheng wrote: > Unless other GFS folks can give you more ideas, I think your best bet at > this moment is to think "outside" the box. That is, don't do > file-to-file backup if all possible. Check out other block level backup > strategies. Are Linux LVM mirroring and/or snapshots workable for you ? > Does your SAN vendor provide embedded features (e.g. Netapp SAN box > offers snapshot, snapmirror, syncmirror, etc) ? What about GFS2? We have similar problems, using GFS on a ftp server, where (for example) doing rsync's is almost impossible for large trees. We tried some of the tuning suggestions you made in earlier mails and on your web pages on RHEL 5.l, but none of them had a substantial effect, only the tuning for making "df" more responsive worked. We (while already having put part of our volumes on ext3 with NFS, a situation that is far from ideal for the cluster) are about to do some new tests. One of the is trying GFS2 on one volume. I'd appreciate if you can summarize (references to) the current (RHEL 5.2) tuning possibilities for GFS. If there is nothing new, we want to start a test with GFS2. -- -- Jos Vos -- X/OS Experts in Open Systems BV | Phone: +31 20 6938364 -- Amsterdam, The Netherlands | Fax: +31 20 6948204 From gordan at bobich.net Thu Oct 16 16:44:20 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 16 Oct 2008 17:44:20 +0100 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <20081016163056.GA14934@jasmine.xos.nl> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> <20081016163056.GA14934@jasmine.xos.nl> Message-ID: <48F76F64.80704@bobich.net> Jos Vos wrote: > On Thu, Oct 16, 2008 at 10:50:53AM -0500, Wendy Cheng wrote: > >> Unless other GFS folks can give you more ideas, I think your best bet at >> this moment is to think "outside" the box. That is, don't do >> file-to-file backup if all possible. Check out other block level backup >> strategies. Are Linux LVM mirroring and/or snapshots workable for you ? >> Does your SAN vendor provide embedded features (e.g. Netapp SAN box >> offers snapshot, snapmirror, syncmirror, etc) ? > > What about GFS2? > > We have similar problems, using GFS on a ftp server, where (for example) > doing rsync's is almost impossible for large trees. > > We tried some of the tuning suggestions you made in earlier mails and on > your web pages on RHEL 5.l, but none of them had a substantial effect, > only the tuning for making "df" more responsive worked. > > We (while already having put part of our volumes on ext3 with NFS, a > situation that is far from ideal for the cluster) are about to do some > new tests. One of the is trying GFS2 on one volume. > > I'd appreciate if you can summarize (references to) the current > (RHEL 5.2) tuning possibilities for GFS. If there is nothing new, > we want to start a test with GFS2. Since you're experimenting, OCFS2 might be worth trying, on the offchance that it works better for your specific usage pattern. Gordan From jos at xos.nl Thu Oct 16 17:58:18 2008 From: jos at xos.nl (Jos Vos) Date: Thu, 16 Oct 2008 19:58:18 +0200 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <48F76F64.80704@bobich.net> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> <20081016163056.GA14934@jasmine.xos.nl> <48F76F64.80704@bobich.net> Message-ID: <20081016175818.GA16742@jasmine.xos.nl> On Thu, Oct 16, 2008 at 05:44:20PM +0100, Gordan Bobic wrote: > Since you're experimenting, OCFS2 might be worth trying, on the > offchance that it works better for your specific usage pattern. Not that kind of experimenting ;-), I want it to be based on RHEL5. -- -- Jos Vos -- X/OS Experts in Open Systems BV | Phone: +31 20 6938364 -- Amsterdam, The Netherlands | Fax: +31 20 6948204 From gordan at bobich.net Thu Oct 16 18:16:32 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 16 Oct 2008 19:16:32 +0100 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <20081016175818.GA16742@jasmine.xos.nl> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> <20081016163056.GA14934@jasmine.xos.nl> <48F76F64.80704@bobich.net> <20081016175818.GA16742@jasmine.xos.nl> Message-ID: <48F78500.5000100@bobich.net> Jos Vos wrote: > On Thu, Oct 16, 2008 at 05:44:20PM +0100, Gordan Bobic wrote: > >> Since you're experimenting, OCFS2 might be worth trying, on the >> offchance that it works better for your specific usage pattern. > > Not that kind of experimenting ;-), I want it to be based on RHEL5. OCFS2 will run on RHEL5, and there is no need to abandon RHCS. It's just a different file system, and there are RPMs for RHEL5 available. Gordan From bkyoung at gmail.com Thu Oct 16 20:21:05 2008 From: bkyoung at gmail.com (Brandon Young) Date: Thu, 16 Oct 2008 15:21:05 -0500 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <48F762DD.8040102@gmail.com> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> Message-ID: <824ffea00810161321n569e4fd3wa707c96d884588e2@mail.gmail.com> Wendy, We have searched high and low for an alternative to file-to-file backups, especially looking block level backups. The only product we've found that "supports" GFS is Bak Bone Replicator. My first crack at installing it was late last week. The experience was worrisome. The replicator service inserts a kernel module, which by itself is livable; but in our particular case, we found a changed behavior in error codes the kernel returns for things like non existent files, while this module is loaded. Ultimately, if the kernel module was the root cause of that behavior (we're still investigating), that's unworkable. As for LVM snapshotting ... I am under the impression that those features are unavailable in GFS (and are slated for GFS2? Which is not "production ready", yet?) It has certainly occured to me to try that feature, if only it were available. Am I misinformed? Perhaps I need some more education on how exactly LVM mirroring will help me. I am *attempting* to approximate a traditional backup scheme, atleast on this particular filesystem. Am I correct in believing that I could snapshot a volume (assuming the feature is available) and run a traditional backup (using, say, rdiff-backup) in a shorter time than I can now, where I'm running it straight off a live GFS volume? -- Brandon On Thu, Oct 16, 2008 at 10:50 AM, Wendy Cheng wrote: > Brandon Young wrote: > >> Hi all, >> >> I currently have a GFS deployment consisting of eight servers and several >> GFS volumes. One of my GFS servers is a dedicated backup server with a >> second replica SAN attached to it through a second HBA. My approach to >> backups has been with tools such as rsync and rdiff-backup, run on a nightly >> basis. I am having a particular problem with one or two of my filesystems >> taking a *very* long time to backup. For example, I have /home living on >> GFS. Day-to-day performance is acceptable, but backups are hideously slow. >> Every night, I kick off an rdiff-backup of /home from my backup server, >> which dumps the backup onto an XFS filesystem on the replica SAN. This >> backup can take days in some cases. >> > > Not only GFS, the "getdents()" has been more than annoying on many > filesystems if entries count within the directory is high - but, yes, > GFS is particularly bloody slow with its directory read. There have been > efforts contributed by Red Hat POSIX and LIBC folks to have new > standardized light-weight directory operations. Unfortunately I lost > tracks of their progress ... On the other hand, integrating these new > calls into GFS would take time anyway (if they are available) - so > unlikely it can meet your need. There were also few experimental GFS > patches but none of them made into the production code. > > Unless other GFS folks can give you more ideas, I think your best bet at > this moment is to think "outside" the box. That is, don't do > file-to-file backup if all possible. Check out other block level backup > strategies. Are Linux LVM mirroring and/or snapshots workable for you ? > Does your SAN vendor provide embedded features (e.g. Netapp SAN box > offers snapshot, snapmirror, syncmirror, etc) ? > > -- Wendy > > >> We have done some investigating, and found that it appears that >> getdents(2) calls (which give the list of filenames present in a directory) >> are spectacularly slow on GFS, irrespective of the size of the directory in >> question. In particular, with 'strace -r', I'm seeing a rate below 100 >> filenames per second. The filesystem /home has at least 10 million files in >> it, which doing the math means 29.5 hours just to do the getdents calls to >> scan them, which is more than a third of wall-clock time. And that's before >> we even start stat'ing. >> >> I google'd around a bit and I can't see any discussion of slow getdents >> calls under GFS. Is there any chance we have some sort of tunable turned >> on/off that might be causing this? I'm not sure which tunables to consider >> tweaking, even. This seems awfully slow, even with sub-optimal locking. Is >> there perhaps some tunable I can try tweaking to improve this situation? >> Any insights would be much appreciated. >> >> -- >> Brandon >> ------------------------------------------------------------------------ >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Thu Oct 16 20:29:52 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 16 Oct 2008 21:29:52 +0100 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <824ffea00810161321n569e4fd3wa707c96d884588e2@mail.gmail.com> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> <824ffea00810161321n569e4fd3wa707c96d884588e2@mail.gmail.com> Message-ID: <48F7A440.9050604@bobich.net> Brandon Young wrote: > As for LVM snapshotting ... I am under the impression that those > features are unavailable in GFS (and are slated for GFS2? Which is not > "production ready", yet?) It has certainly occured to me to try that > feature, if only it were available. Am I misinformed? Perhaps I need > some more education on how exactly LVM mirroring will help me. I am > *attempting* to approximate a traditional backup scheme, atleast on this > particular filesystem. Am I correct in believing that I could snapshot > a volume (assuming the feature is available) and run a traditional > backup (using, say, rdiff-backup) in a shorter time than I can now, > where I'm running it straight off a live GFS volume? You can use CLVM (Cluser aware LVM) and create GFS on top of that volume. You can them use CLVM to take a snapshot of the block device, mount it read-only with lock_nolock and back that up. That should go at non-clustered FS speeds. Gordan From siddiqut at gmail.com Thu Oct 16 20:32:31 2008 From: siddiqut at gmail.com (Tajdar Siddiqui) Date: Thu, 16 Oct 2008 16:32:31 -0400 Subject: [Linux-cluster] gfs cluster server question Message-ID: <3abaa1ce0810161332u38e20545j9fd03cc6169eeeea@mail.gmail.com> I apologize in advance i have not worded my question correctly: What kind of issues need to be considered is some of the servers connecting to gfs san are spread over WAN. Assume that data will be written/read by all the servers in the cluster and also that there will be cross talk: meaning data written by Server1 which connects to SAN over WAN will be read by Server 2 which connects to SAN over LAN and vice versa. Thanx, Tajdar -------------- next part -------------- An HTML attachment was scrubbed... URL: From kanderso at redhat.com Thu Oct 16 20:35:02 2008 From: kanderso at redhat.com (Kevin Anderson) Date: Thu, 16 Oct 2008 15:35:02 -0500 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <48F7A440.9050604@bobich.net> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> <824ffea00810161321n569e4fd3wa707c96d884588e2@mail.gmail.com> <48F7A440.9050604@bobich.net> Message-ID: <1224189302.2982.52.camel@dhcp80-204.msp.redhat.com> On Thu, 2008-10-16 at 21:29 +0100, Gordan Bobic wrote: > Brandon Young wrote: > > > As for LVM snapshotting ... I am under the impression that those > > features are unavailable in GFS (and are slated for GFS2? Which is not > > "production ready", yet?) It has certainly occured to me to try that > > feature, if only it were available. Am I misinformed? Perhaps I need > > some more education on how exactly LVM mirroring will help me. I am > > *attempting* to approximate a traditional backup scheme, atleast on this > > particular filesystem. Am I correct in believing that I could snapshot > > a volume (assuming the feature is available) and run a traditional > > backup (using, say, rdiff-backup) in a shorter time than I can now, > > where I'm running it straight off a live GFS volume? > > You can use CLVM (Cluser aware LVM) and create GFS on top of that > volume. You can them use CLVM to take a snapshot of the block device, > mount it read-only with lock_nolock and back that up. That should go at > non-clustered FS speeds. > We don't have support for cluster snapshots as of yet even though it has been on the todo list for about 5 years now :(. Kevin From gordan at bobich.net Thu Oct 16 20:47:50 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 16 Oct 2008 21:47:50 +0100 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <1224189302.2982.52.camel@dhcp80-204.msp.redhat.com> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> <824ffea00810161321n569e4fd3wa707c96d884588e2@mail.gmail.com> <48F7A440.9050604@bobich.net> <1224189302.2982.52.camel@dhcp80-204.msp.redhat.com> Message-ID: <48F7A876.6010300@bobich.net> Kevin Anderson wrote: > On Thu, 2008-10-16 at 21:29 +0100, Gordan Bobic wrote: >> Brandon Young wrote: >> >>> As for LVM snapshotting ... I am under the impression that those >>> features are unavailable in GFS (and are slated for GFS2? Which is not >>> "production ready", yet?) It has certainly occured to me to try that >>> feature, if only it were available. Am I misinformed? Perhaps I need >>> some more education on how exactly LVM mirroring will help me. I am >>> *attempting* to approximate a traditional backup scheme, atleast on this >>> particular filesystem. Am I correct in believing that I could snapshot >>> a volume (assuming the feature is available) and run a traditional >>> backup (using, say, rdiff-backup) in a shorter time than I can now, >>> where I'm running it straight off a live GFS volume? >> You can use CLVM (Cluser aware LVM) and create GFS on top of that >> volume. You can them use CLVM to take a snapshot of the block device, >> mount it read-only with lock_nolock and back that up. That should go at >> non-clustered FS speeds. >> > We don't have support for cluster snapshots as of yet even though it has > been on the todo list for about 5 years now :(. Joy... My mistake. Sorry I mentioned it. Gordan From swplotner at amherst.edu Thu Oct 16 21:22:04 2008 From: swplotner at amherst.edu (Steffen Plotner) Date: Thu, 16 Oct 2008 17:22:04 -0400 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <48F7A876.6010300@bobich.net> Message-ID: > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Gordan Bobic > Sent: Thursday, October 16, 2008 4:48 PM > To: linux clustering > Subject: Re: [Linux-cluster] GFS Tunables > > Kevin Anderson wrote: > > On Thu, 2008-10-16 at 21:29 +0100, Gordan Bobic wrote: > >> Brandon Young wrote: > >> > >>> As for LVM snapshotting ... I am under the impression that those > >>> features are unavailable in GFS (and are slated for GFS2? > Which is > >>> not "production ready", yet?) It has certainly occured > to me to try > >>> that feature, if only it were available. Am I > misinformed? Perhaps > >>> I need some more education on how exactly LVM mirroring will help > >>> me. I am > >>> *attempting* to approximate a traditional backup scheme, > atleast on > >>> this particular filesystem. Am I correct in believing > that I could > >>> snapshot a volume (assuming the feature is available) and run a > >>> traditional backup (using, say, rdiff-backup) in a > shorter time than > >>> I can now, where I'm running it straight off a live GFS volume? > >> You can use CLVM (Cluser aware LVM) and create GFS on top of that > >> volume. You can them use CLVM to take a snapshot of the > block device, > >> mount it read-only with lock_nolock and back that up. That > should go > >> at non-clustered FS speeds. > >> > > We don't have support for cluster snapshots as of yet even > though it > > has been on the todo list for about 5 years now :(. > > Joy... My mistake. Sorry I mentioned it. > How about snapshotting at the backend storage device? We usually use linux as the backed, hand out storage via iscsi and snapshot at the backend - this eliminates the need for doing snaps at the GFS level - I agree that if there was snapshotting at the LVM/GFS level we could get a clean snapshot.... another problem.. > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From gordan at bobich.net Thu Oct 16 21:42:07 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 16 Oct 2008 22:42:07 +0100 Subject: [Linux-cluster] GFS Tunables In-Reply-To: References: Message-ID: <48F7B52F.4000907@bobich.net> Steffen Plotner wrote: >> Kevin Anderson wrote: >>> On Thu, 2008-10-16 at 21:29 +0100, Gordan Bobic wrote: >>>> Brandon Young wrote: >>>> >>>>> As for LVM snapshotting ... I am under the impression that those >>>>> features are unavailable in GFS (and are slated for GFS2? >> Which is >>>>> not "production ready", yet?) It has certainly occured >> to me to try >>>>> that feature, if only it were available. Am I >> misinformed? Perhaps >>>>> I need some more education on how exactly LVM mirroring will help >>>>> me. I am >>>>> *attempting* to approximate a traditional backup scheme, >> atleast on >>>>> this particular filesystem. Am I correct in believing >> that I could >>>>> snapshot a volume (assuming the feature is available) and run a >>>>> traditional backup (using, say, rdiff-backup) in a >> shorter time than >>>>> I can now, where I'm running it straight off a live GFS volume? >>>> You can use CLVM (Cluser aware LVM) and create GFS on top of that >>>> volume. You can them use CLVM to take a snapshot of the >> block device, >>>> mount it read-only with lock_nolock and back that up. That >> should go >>>> at non-clustered FS speeds. >>>> >>> We don't have support for cluster snapshots as of yet even >> though it >>> has been on the todo list for about 5 years now :(. >> Joy... My mistake. Sorry I mentioned it. >> > > How about snapshotting at the backend storage device? We usually use > linux as the backed, hand out storage via iscsi and snapshot at the > backend - this eliminates the need for doing snaps at the GFS level - I > agree that if there was snapshotting at the LVM/GFS level we could get a > clean snapshot.... another problem.. The problem with that being that you have your storage system as a single point of failure which rather defeats the point of clustering. Gordan From pbruna at it-linux.cl Fri Oct 17 04:03:14 2008 From: pbruna at it-linux.cl (Patricio A. Bruna) Date: Fri, 17 Oct 2008 01:03:14 -0300 (CLST) Subject: [Linux-cluster] Email alert Message-ID: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> Its possible to configure Cluster Suite to send an email when a service change host or faild to failover? ------------------------------------ Patricio Bruna V. IT Linux Ltda. http://www.it-linux.cl Fono : (+56-2) 333 0578 - Chile Fono: (+54-11) 6632 2760 - Argentina M?vil : (+56-09) 8827 0342 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at netspot.com.au Fri Oct 17 04:58:06 2008 From: tom at netspot.com.au (Tom Lanyon) Date: Fri, 17 Oct 2008 15:28:06 +1030 Subject: [Linux-cluster] GFS Tunables In-Reply-To: <1224189302.2982.52.camel@dhcp80-204.msp.redhat.com> References: <824ffea00810160752l1aa6df1el68032eba4ac87df5@mail.gmail.com> <48F762DD.8040102@gmail.com> <824ffea00810161321n569e4fd3wa707c96d884588e2@mail.gmail.com> <48F7A440.9050604@bobich.net> <1224189302.2982.52.camel@dhcp80-204.msp.redhat.com> Message-ID: On 17/10/2008, at 7:05 AM, Kevin Anderson wrote: > We don't have support for cluster snapshots as of yet even though it > has > been on the todo list for about 5 years now :(. > > Kevin I asked recently on the lvm list what the status of CLVM snapshotting was and got no response... :) From macscr at macscr.com Fri Oct 17 05:19:06 2008 From: macscr at macscr.com (Mark Chaney) Date: Fri, 17 Oct 2008 00:19:06 -0500 Subject: [Linux-cluster] Email alert In-Reply-To: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> References: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> Message-ID: <028901c93017$e230f490$a692ddb0$@com> No, but you can use a monitoring service like nagios to do that. From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patricio A. Bruna Sent: Thursday, October 16, 2008 11:03 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] Email alert Its possible to configure Cluster Suite to send an email when a service change host or faild to failover? ------------------------------------ Patricio Bruna V. IT Linux Ltda. http://www.it-linux.cl Fono : (+56-2) 333 0578 - Chile Fono: (+54-11) 6632 2760 - Argentina M?vil : (+56-09) 8827 0342 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsd_daemon at msn.com Fri Oct 17 07:29:29 2008 From: bsd_daemon at msn.com (Mehmet CELIK) Date: Fri, 17 Oct 2008 07:29:29 +0000 Subject: [Linux-cluster] Email alert In-Reply-To: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> References: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> Message-ID: Its not possible, right now. Maybe in the future.. But, you can use the mon service.-- Mehmet CELIK Istanbul/TURKEY Date: Fri, 17 Oct 2008 01:03:14 -0300From: pbruna at it-linux.clTo: linux-cluster at redhat.comSubject: [Linux-cluster] Email alert Its possible to configure Cluster Suite to send an email when a service change host or faild to failover?------------------------------------Patricio Bruna V.IT Linux Ltda.http://www.it-linux.clFono : (+56-2) 333 0578 - ChileFono: (+54-11) 6632 2760 - ArgentinaM?vil : (+56-09) 8827 0342 _________________________________________________________________ Store, manage and share up to 5GB with Windows Live SkyDrive. http://skydrive.live.com/welcome.aspx?provision=1?ocid=TXT_TAGLM_WL_skydrive_102008 -------------- next part -------------- An HTML attachment was scrubbed... URL: From federico.simoncelli at gmail.com Fri Oct 17 11:27:02 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Fri, 17 Oct 2008 13:27:02 +0200 Subject: [Linux-cluster] Avoiding fencing loops with startup quorum (patch) Message-ID: Hi all, I am managing a two-node cluster without qdisk and to avoid the fencing loop I thought to implement a startup quorum. The startup quorum would be the minimum total votes a node requires to remain in the cluster after the startup process. When cman is started on a node it joins the cluster and if the total votes are less than the CMAN_STARTUP_QUORUM it leaves preventing fencing to be executed. I configured my CMAN_STARTUP_QUORUM to 2: - When the two nodes joins the cluster together the total votes are 2; everything is normal. - When a node get fenced the remaining one is in quorate (cman real quorum: 1). - When the fenced node boots up and finds the other node the total votes are 2; everything is normal. - When the fenced node boots up and doesn't find the other node the total votes are 1 (< CMAN_STARTUP_QUORUM); the node leaves the cluster, stop cman and prevent fencing. This might be handy for booting up a remote node for maintenance and not being worried about fencing loops. The downside is that you can't boot a single node and having it working alone; this situation can be considered an emergency and can be handled manually. The startup quorum might resolve also: https://www.redhat.com/archives/linux-cluster/2008-June/msg00143.html https://bugzilla.redhat.com/show_bug.cgi?id=452234 Patch in attachment. Can anyone review it? Is anyone interested to integrate this same behaviour into cman and the cluster.conf? Ex: Thanks, -- Federico. -------------- next part -------------- A non-text attachment was scrubbed... Name: cman_startup_quorum.patch Type: application/octet-stream Size: 979 bytes Desc: not available URL: From jmacfarland at nexatech.com Fri Oct 17 13:36:04 2008 From: jmacfarland at nexatech.com (Jeff Macfarland) Date: Fri, 17 Oct 2008 08:36:04 -0500 Subject: [Linux-cluster] Email alert In-Reply-To: <028901c93017$e230f490$a692ddb0$@com> References: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> <028901c93017$e230f490$a692ddb0$@com> Message-ID: <48F894C4.9010506@nexatech.com> Mark Chaney wrote: > No, but you can use a monitoring service like nagios to do that. Is "RIND" (http://sources.redhat.com/cluster/wiki/EventScripting) not applicable? Or, if implemented in , will it prevent the system from automated failover of services, etc? I dunno much about slang, but it looks like it at least supports system() for a quick email if nothing else. > > > > *From:* linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Patricio A. Bruna > *Sent:* Thursday, October 16, 2008 11:03 PM > *To:* linux-cluster at redhat.com > *Subject:* [Linux-cluster] Email alert > > > > Its possible to configure Cluster Suite to send an email when a service > change host or faild to failover? > > ------------------------------------ > Patricio Bruna V. > IT Linux Ltda. > http://www.it-linux.cl > Fono : (+56-2) 333 0578 - Chile > Fono: (+54-11) 6632 2760 - Argentina > M?vil : (+56-09) 8827 0342 > -- Jeff Macfarland (jmacfarland at nexatech.com) Nexa Technologies - 972.747.8879 Systems Administrator GPG Key ID: 0x5F1CA61B GPG Key Server: hkp://wwwkeys.pgp.net From teigland at redhat.com Fri Oct 17 15:41:42 2008 From: teigland at redhat.com (David Teigland) Date: Fri, 17 Oct 2008 10:41:42 -0500 Subject: [Linux-cluster] Avoiding fencing loops with startup quorum (patch) In-Reply-To: References: Message-ID: <20081017154142.GE3299@redhat.com> On Fri, Oct 17, 2008 at 01:27:02PM +0200, Federico Simoncelli wrote: > Hi all, > I am managing a two-node cluster without qdisk and to avoid the > fencing loop I thought to implement a startup quorum. Sounds like you want quorum of 2 when nodes are joining, but quorum of 1 after a node fails. That sounds reasonable. To do that manually, you *don't* set two_node/expected_votes in cluster.conf, and then manually run cman_tool expected -e 1 after a node fails. Here's another possibility I hadn't thought of before: . don't set two_node/expteced_votes in cluster.conf . edit init.d/cman and possibly /etc/sysconfig/cman to do cman_tool join -w (joins cluster and waits to be a member) cman_tool wait -q (waits for quorum, both nodes to be members) cman_tool expected -e 1 (change expected votes to 1) The effects of this will be: . a node needs to see the other to get quorum and start up . after both nodes see each other, if one fails, the other will fence it and continue . after both nodes see each other, if they become partitioned, they will race to fence each other . if both nodes are restarted while they are still partitioned, neither of them will be able to start Dave From federico.simoncelli at gmail.com Fri Oct 17 16:11:53 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Fri, 17 Oct 2008 18:11:53 +0200 Subject: [Linux-cluster] Avoiding fencing loops with startup quorum (patch) In-Reply-To: <20081017154142.GE3299@redhat.com> References: <20081017154142.GE3299@redhat.com> Message-ID: On Fri, Oct 17, 2008 at 5:41 PM, David Teigland wrote: > Here's another possibility I hadn't thought of before: > > . don't set two_node/expteced_votes in cluster.conf > . edit init.d/cman and possibly /etc/sysconfig/cman to do > cman_tool join -w (joins cluster and waits to be a member) > cman_tool wait -q (waits for quorum, both nodes to be members) > cman_tool expected -e 1 (change expected votes to 1) I tried this before but the downsides were: - long waits due not being in quorum (fence timeout is 600 seconds by default) - you have to rewrite the cluster.conf to make it work - break legacies with distros/systems not using this init script Let me know what you think about the problems I listed above because this solution looks much cleaner. Thanks, -- Federico. From teigland at redhat.com Fri Oct 17 16:09:23 2008 From: teigland at redhat.com (David Teigland) Date: Fri, 17 Oct 2008 11:09:23 -0500 Subject: [Linux-cluster] Avoiding fencing loops with startup quorum (patch) In-Reply-To: References: <20081017154142.GE3299@redhat.com> Message-ID: <20081017160923.GG3299@redhat.com> On Fri, Oct 17, 2008 at 06:11:53PM +0200, Federico Simoncelli wrote: > On Fri, Oct 17, 2008 at 5:41 PM, David Teigland wrote: > > Here's another possibility I hadn't thought of before: > > > > . don't set two_node/expteced_votes in cluster.conf > > . edit init.d/cman and possibly /etc/sysconfig/cman to do > > cman_tool join -w (joins cluster and waits to be a member) > > cman_tool wait -q (waits for quorum, both nodes to be members) > > cman_tool expected -e 1 (change expected votes to 1) > > I tried this before but the downsides were: > > - long waits due not being in quorum (fence timeout is 600 seconds by > default) cman_tool join -w can will possibly wait a long time if the other node is not up or is partitioned. That's the price you pay for avoiding the potential back-and-forth fencing loop. > - you have to rewrite the cluster.conf to make it work I don't know what you're refering to. You simply don't include the two_node/expected_votes line. > - break legacies with distros/systems not using this init script Yes, you have to hack the init script. There's a change coming in RHEL5.3 that's very similar to this, where you won't need to hack anything: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=5ea416d26ec2b6bf605c573a5173736d0f8cd27c http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=397b8111d2d69b9dd25e7b074822be571f274032 From federico.simoncelli at gmail.com Fri Oct 17 16:56:32 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Fri, 17 Oct 2008 18:56:32 +0200 Subject: [Linux-cluster] Avoiding fencing loops with startup quorum (patch) In-Reply-To: <20081017160923.GG3299@redhat.com> References: <20081017154142.GE3299@redhat.com> <20081017160923.GG3299@redhat.com> Message-ID: On Fri, Oct 17, 2008 at 6:09 PM, David Teigland wrote: >> I tried this before but the downsides were: >> >> - long waits due not being in quorum (fence timeout is 600 seconds by >> default) > > cman_tool join -w can will possibly wait a long time if the other node > is not up or is partitioned. That's the price you pay for avoiding the > potential back-and-forth fencing loop. I'll pay :-) I quickly made a patch (in attachment). Tested with: # cat /etc/sysconfig/cman CMAN_QUORUM_TIMEOUT=10 CMAN_EXPECTED_QUORUM=1 Working fine for now. More testing after the weekend. :-) Comments are welcome. -- Federico. -------------- next part -------------- A non-text attachment was scrubbed... Name: cman_expected_quorum.patch Type: application/octet-stream Size: 858 bytes Desc: not available URL: From virginian at blueyonder.co.uk Sat Oct 18 12:42:24 2008 From: virginian at blueyonder.co.uk (Virginian) Date: Sat, 18 Oct 2008 13:42:24 +0100 Subject: [Linux-cluster] Strange error messages in /var/log/messages References: <1224085345.3277.2.camel@localhost.localdomain><1224085725.3277.4.camel@localhost.localdomain> <1224098289.5912.5.camel@ayanami> Message-ID: <5108FACF53854CBF98D6CA1C01542AEC@Desktop> Hi Lon, I see the attached patch but I am not a programmer so I am not sure what it means? Thanks John ----- Original Message ----- From: "Lon Hohberger" To: "linux clustering" Sent: Wednesday, October 15, 2008 8:18 PM Subject: Re: [Linux-cluster] Strange error messages in /var/log/messages > On Wed, 2008-10-15 at 11:48 -0400, jim parsons wrote: >> On Wed, 2008-10-15 at 11:42 -0400, jim parsons wrote: >> > On Wed, 2008-10-15 at 16:01 +0100, Virginian wrote: >> > >> > > >> > >> > This tag does not need to be in the inner clusters' (dom u cluster) >> > conf file, only the cluster set up on the physical hosts. >> > >> > That might be the problem - easy enough to check! :) >> >> It would be fun to know if the above fixes the issue. Please let me >> know. > > I think I see it. > > -- Lon > -------------------------------------------------------------------------------- > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From fdinitto at redhat.com Mon Oct 20 10:07:26 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 20 Oct 2008 12:07:26 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.99.11 (development snapshot) released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its community are proud to announce the 2.99.11 release from the master branch. Important note: If you are running 2.99.xx series, please upgrade immediatly to this version. This release addresses a major bug in GFS1 kernel module and also contains a security fix for fence_egenera. The development cycle for 3.0 is proceeding at a very good speed and mostlikely one of the next releases will be 3.0alpha1. All features designed for 3.0 are being completed and taking a proper shape, the library API has been stable for sometime (and will soon be marked as 3.0 soname). Stay tuned for upcoming updates! The 2.99.XX releases are _NOT_ meant to be used for production environments.. yet. The master branch is the main development tree that receives all new features, code, clean up and a whole brand new set of bugs, At some point in time this code will become the 3.0 stable release. Everybody with test equipment and time to spare, is highly encouraged to download, install and test the 2.99 releases and more important report problems. In order to build the 2.99.11 release you will need: - - corosync svn r1667. - - openais svn r1651. - - linux kernel (2.6.27) The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.11.tar.gz https://fedorahosted.org/releases/c/l/cluster/cluster-2.99.11.tar.gz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.99.10): Abhijith Das (5): gfs-kernel: GFS: madvise system call causes assertion gfs-kernel: bz457473 - GFS ignore the noatime and nodiratime mount options Revert "gfs-kernel: bz457473 - GFS ignore the noatime and nodiratime mount options" gfs-kernel and mount.gfs2: GFS ignore the noatime and nodiratime mount options gfs-kernel: bz 458765 - In linux-2.6.26 / 2.03.06, GFS1 can't create more than 4kb file Bob Peterson (2): GFS: gfs_fsck invalid response to question changes the question gfs-kmod: GFS corruption after forced withdraw Christine Caulfield (1): cman: fix a couple of unhandled malloc failures David Teigland (8): dlm_controld: add protocol negotiation fenced: add protocol negotiation fenced/fence_tool: improve list info fence_tool/dlm_tool/gfs_control: remove error message daemons/tools: misc minor cleanups and improvements dlm/fence: daemon fixes and tool improvements gfs_control: improve ls output fenced/dlm_controld/gfs_controld: modify a debug message Fabio M. Di Nitto (2): fence egenera: fix logging file rgmanager: fix build after port to logsys Jan Friesse (1): fence: New fence agent for Logical Domains (LDOMs) Lon Hohberger (5): rgmanager: First pass at port to logsys group: Allow group_tool ls to be scriptable rgmanager: make clulog build even though it's incomplete rgmanager: don't change the build target just yet [fence] Fix fence_xvmd trying to read wrong args from ccs Marek 'marx' Grac (3): [FENCE] Fix #290231 - "Switch (optional)" param does not default to "1" and program fails [RGMANAGER] - Fix #462910 postgres-8.sh and metadata fixes [fence] Operation 'list' for APC fence agent Ryan McCabe (1): cman: Fix typo that caused start-up to fail Ryan O'Hara (3): cman: allow custom xen network bridge scripts fence_scsi: improve logging for debugging fence_scsi: correctly declare key_list Steven Whitehouse (1): libgfs2: Add support for UUID generation to gfs2_mkfs rohara (2): fence_scsi.pl: check if nodeid is zero scsi_reserve: add restart option cman/daemon/daemon.c | 2 +- cman/init.d/cman | 17 +- cman/lib/libcman.c | 2 + dlm/libdlmcontrol/libdlmcontrol.h | 1 - dlm/tool/main.c | 105 +++-- fence/agents/apc/fence_apc.py | 20 +- fence/agents/egenera/fence_egenera.pl | 2 +- fence/agents/ldom/Makefile | 5 + fence/agents/ldom/fence_ldom.py | 101 +++++ fence/agents/lib/fencing.py.py | 26 +- fence/agents/scsi/fence_scsi.pl | 97 +++-- fence/agents/scsi/scsi_reserve | 55 +++ fence/agents/xvm/options-ccs.c | 3 + fence/fence_tool/fence_tool.c | 61 +++- fence/fenced/cpg.c | 515 +++++++++++++++++++--- fence/fenced/fd.h | 12 +- fence/fenced/main.c | 8 + fence/fenced/member_cman.c | 2 +- fence/fenced/recover.c | 10 +- fence/man/Makefile | 1 + fence/man/fence_ldom.8 | 114 +++++ gfs-kernel/src/gfs/glock.h | 15 +- gfs-kernel/src/gfs/incore.h | 1 + gfs-kernel/src/gfs/log.c | 27 +- gfs-kernel/src/gfs/mount.c | 3 + gfs-kernel/src/gfs/ops_address.c | 34 +-- gfs-kernel/src/gfs/ops_fstype.c | 2 +- gfs/gfs_fsck/log.c | 10 +- gfs2/libgfs2/ondisk.c | 3 + gfs2/libgfs2/structures.c | 12 +- gfs2/mount/mount.gfs2.c | 1 + gfs2/mount/util.c | 7 + group/daemon/main.c | 4 +- group/dlm_controld/cpg.c | 632 +++++++++++++++++++++++++-- group/dlm_controld/dlm_daemon.h | 7 +- group/dlm_controld/group.c | 7 +- group/dlm_controld/main.c | 14 +- group/gfs_control/main.c | 134 ++++-- group/gfs_controld/cpg-new.c | 27 +- group/gfs_controld/cpg-old.c | 4 +- group/gfs_controld/gfs_daemon.h | 1 + group/gfs_controld/main.c | 2 + group/tool/main.c | 37 ++- rgmanager/include/clulog.h | 139 ------ rgmanager/include/logging.h | 10 + rgmanager/include/resgroup.h | 4 +- rgmanager/src/clulib/Makefile | 6 +- rgmanager/src/clulib/clulog.c | 281 ------------ rgmanager/src/clulib/logging.c | 225 ++++++++++ rgmanager/src/clulib/msg_cluster.c | 6 +- rgmanager/src/daemons/Makefile | 19 +- rgmanager/src/daemons/clurmtabd.c | 52 ++-- rgmanager/src/daemons/depends.c | 14 +- rgmanager/src/daemons/event_config.c | 18 +- rgmanager/src/daemons/fo_domain.c | 90 ++--- rgmanager/src/daemons/groups.c | 104 +++--- rgmanager/src/daemons/main.c | 120 +++--- rgmanager/src/daemons/reslist.c | 7 +- rgmanager/src/daemons/resrules.c | 6 +- rgmanager/src/daemons/restree.c | 11 +- rgmanager/src/daemons/rg_event.c | 40 +- rgmanager/src/daemons/rg_forward.c | 26 +- rgmanager/src/daemons/rg_state.c | 185 ++++---- rgmanager/src/daemons/rg_thread.c | 12 +- rgmanager/src/daemons/service_op.c | 16 +- rgmanager/src/daemons/slang_event.c | 32 +- rgmanager/src/daemons/test.c | 1 + rgmanager/src/daemons/watchdog.c | 8 +- rgmanager/src/resources/ocf-shellfuncs | 3 +- rgmanager/src/resources/postgres-8.metadata | 2 +- rgmanager/src/resources/postgres-8.sh | 16 +- rgmanager/src/resources/utils/ra-skelet.sh | 5 + rgmanager/src/utils/Makefile | 13 +- rgmanager/src/utils/clubufflush.c | 12 +- rgmanager/src/utils/clulog.c | 123 ++---- rgmanager/src/utils/clunfsops.c | 18 +- 76 files changed, 2482 insertions(+), 1285 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSPxYZQgUGcMLQ3qJAQJT8A//aFXIad5tlL3can9qHKKU01VJ4JGZ3SOv pBmALC/S6Z+QWzw8e1Uawbu8iuxHUH4rG87GFV+792SBdn9UXotP045UsJVqX3O5 Zcat0T0TkwqSJGD+afT8GAeH7jsw9d92nN30E5THqdwv96EXkLZGDmhQVxrJgTd4 dmNV+010UJCY3Btgu088twv2ggRDOyHKDmAAj0r4vvsm/B5TqXe5Vk2DJrGsLOcL GA7/GxbrgcporBme7dgGBbJFdBLIGDa9UeHsF2GZTilVvSKdYU5LpnM0yo+Sh1Y6 kse5hb7zDzAYm+Ns/9S3skb+N4rQT7ZIYoYaBxZuHSgNVwzbQvTtqgxGNKn3LZe3 oWcub94agRvlJM6vFkITspxfa3Wfg+w3F07qeOCWOUSeEy4cyfrTbf0Q2pMT+YXh jZM5MUEEIgjtPcmL3TYOjj2xhAkzPhF4pQODtuBy4LDNIcVuFav6/22VWzqpwfan lQRAqf+ep5uZA5w9okuUtXfiRdRkQtSu1McW8zgvV0lZ9NdmsFMVbutkzO7DDKLY hA0rQTtsN96Rr+wAVrVrFTjTlkEDK5zVmbrYi5rNxm/2C8961DM/PEz5lizLZiGa c9Ijtc43PPNlhiXUYPNQLmZ3Ynrh7kA5sB+Zyg2TbnjuY73963UY5ksb+t2WpzcQ D8ePL9urQHo= =bsgx -----END PGP SIGNATURE----- From federico.simoncelli at gmail.com Mon Oct 20 11:27:11 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Mon, 20 Oct 2008 13:27:11 +0200 Subject: [Linux-cluster] Avoiding fencing loops with startup quorum (patch) In-Reply-To: References: <20081017154142.GE3299@redhat.com> <20081017160923.GG3299@redhat.com> Message-ID: On Fri, Oct 17, 2008 at 6:56 PM, Federico Simoncelli wrote: > I quickly made a patch (in attachment). > > Tested with: > > # cat /etc/sysconfig/cman > CMAN_QUORUM_TIMEOUT=10 > CMAN_EXPECTED_QUORUM=1 > > Working fine for now. More testing after the weekend. I confirm that the patch works fine. I just need to say that the two_node flag is required anyway: -- Federico. From pbruna at it-linux.cl Mon Oct 20 14:04:39 2008 From: pbruna at it-linux.cl (Patricio A. Bruna) Date: Mon, 20 Oct 2008 11:04:39 -0300 (CLST) Subject: [Linux-cluster] Email alert In-Reply-To: <48F894C4.9010506@nexatech.com> Message-ID: <30509963.1181224511479439.JavaMail.root@lisa.itlinux.cl> I think this is an importan fetaure tha should be in the core of Cluster Suite. As an admin i must know when my servers do a failover. ------------------------------------ Patricio Bruna V. IT Linux Ltda. http://www.it-linux.cl Fono : (+56-2) 333 0578 - Chile Fono: (+54-11) 6632 2760 - Argentina M?vil : (+56-09) 8827 0342 ----- "Jeff Macfarland" escribi?: > Mark Chaney wrote: > > No, but you can use a monitoring service like nagios to do that. > > Is "RIND" (http://sources.redhat.com/cluster/wiki/EventScripting) not > applicable? Or, if implemented in , will it prevent the system from > automated failover of services, etc? I dunno much about slang, but it > looks like it at least supports system() for a quick email if nothing else. > > > > > > > > > *From:* linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Patricio A. Bruna > > *Sent:* Thursday, October 16, 2008 11:03 PM > > *To:* linux-cluster at redhat.com > > *Subject:* [Linux-cluster] Email alert > > > > > > > > Its possible to configure Cluster Suite to send an email when a service > > change host or faild to failover? > > > > ------------------------------------ > > Patricio Bruna V. > > IT Linux Ltda. > > http://www.it-linux.cl > > Fono : (+56-2) 333 0578 - Chile > > Fono: (+54-11) 6632 2760 - Argentina > > M?vil : (+56-09) 8827 0342 > > > > > -- > Jeff Macfarland (jmacfarland at nexatech.com) > Nexa Technologies - 972.747.8879 > Systems Administrator > GPG Key ID: 0x5F1CA61B > GPG Key Server: hkp://wwwkeys.pgp.net > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scooter at cgl.ucsf.edu Mon Oct 20 17:15:40 2008 From: scooter at cgl.ucsf.edu (Scooter Morris) Date: Mon, 20 Oct 2008 10:15:40 -0700 Subject: [Linux-cluster] GFS2 Test setup Message-ID: <48FCBCBC.50801@cgl.ucsf.edu> We are in the process of building a cluster, which will hope to put into production when RHEL 5.3 is released. Our plan is to use GFS2, which we've been experimenting with for some time, but we're having some problems. The cluster has 3 nodes, two HP DL580's and one HP DL585 -- we're using ILO for fencing. We want to share a couple of filesystems using GFS2 which are connected to our SAN (an EVA 5000). I've set everything up and it all works as expected, although on occasion, GFS2 just seems to hang. This happens 1-4 times/week. What I note in the logs are a series of dlm messages. On node 1 (for example) I see: dlm: connecting to 3 dlm: connecting to 2 dlm: connecting to 2 dlm: connecting to 2 dlm: connecting to 3 dlm: connecting to 2 dlm: connecting to 2 dlm: connecting to 2 dlm: connecting to 3 dlm: connecting to 3 dlm: connecting to 3 dlm: connecting to 3 On node 2, I see: dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted and on node 3, I see: dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted dlm: got connection from 1 Extra connection from node 1 attempted Now for my questions. I know that GFS2 isn't officially released, yet, and I've been seeing logs of checkins on cluster-devel. Should I updated to the latest GFS2 to continue my testing? Is the dlm condition outlined above a known bug in GFS2 that's been fixed in later releases, or have I tripped over something new? Any suggestions would be appreciated! -- scooter -------------- next part -------------- A non-text attachment was scrubbed... Name: scooter.vcf Type: text/x-vcard Size: 378 bytes Desc: not available URL: From ccaulfie at redhat.com Tue Oct 21 07:31:25 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 21 Oct 2008 08:31:25 +0100 Subject: [Linux-cluster] GFS2 Test setup In-Reply-To: <48FCBCBC.50801@cgl.ucsf.edu> References: <48FCBCBC.50801@cgl.ucsf.edu> Message-ID: <48FD854D.7030409@redhat.com> Scooter Morris wrote: > We are in the process of building a cluster, which will hope to put into > production when RHEL 5.3 is released. Our plan is to use GFS2, which > we've been experimenting with for some time, but we're having some > problems. The cluster has 3 nodes, two HP DL580's and one HP DL585 -- > we're using ILO for fencing. We want to share a couple of filesystems > using GFS2 which are connected to our SAN (an EVA 5000). I've set > everything up and it all works as expected, although on occasion, GFS2 > just seems to hang. This happens 1-4 times/week. What I note in the > logs are a series of dlm messages. On node 1 (for example) I see: > > dlm: connecting to 3 > dlm: connecting to 2 > dlm: connecting to 2 > dlm: connecting to 2 > dlm: connecting to 3 > dlm: connecting to 2 > dlm: connecting to 2 > dlm: connecting to 2 > dlm: connecting to 3 > dlm: connecting to 3 > dlm: connecting to 3 > dlm: connecting to 3 > > On node 2, I see: > > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > > and on node 3, I see: > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > dlm: got connection from 1 > Extra connection from node 1 attempted > Those messages are usually caused by routing problems. The DLM binds to the address it is given by cman (see the output of cman_tool status for that) and receiving nodes check incoming packets against that address to make sure that only valid cluster nodes try to make connections. What is happening here (I think - it sounds like a problem I've seen before) is that the packets are being routed though another interface than the one cman is using and the remote node sees them as coming from a different address. This can happen if you have two ethernet interfaces connected to the same physical segment for example. There was a also a bug that could cause this if the routing was not quite so broken but a little odd, though I don't have the bugzilla number to hand, sorry. -- Chrissie From fdinitto at redhat.com Tue Oct 21 07:37:57 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 21 Oct 2008 09:37:57 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.03.08 released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its vibrant community are proud to announce the 2.03.08 release from the STABLE2 branch. The STABLE2 branch collects, on a daily base, all bug fixes and the bare minimal changes required to run the cluster on top of the most recent Linux kernel (2.6.27) and rock solid openais (0.80.3). This release includes some major fixes and addresses 2 security issues in fence agents (apc_snmp and egenera). Please consider upgrading as soon as possible. - From this release GFS1 kernel module is now totally standalone and does not require GFS2 nor a patched upstream kernel to run. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.08.tar.gz https://fedorahosted.org/releases/c/l/cluster/cluster-2.03.08.tar.gz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.03.07): Abhijith Das (10): libgfs2: Bug 459630 - GFS2: changes needed to gfs2-utils due to gfs2meta fs changes in bz 457798 gfs-kernel: bz298931 - GFS unlinked inode metadata leak Revert "gfs-kernel: bz298931 - GFS unlinked inode metadata leak" gfs-kernel: GFS: madvise system call causes assertion gfs-kernel: bz457473 - GFS ignore the noatime and nodiratime mount options gfs-kernel: Bug 450209: Create gfs1-specific lock modules + minor fixes to build with 2.6.27 Revert "gfs-kernel: bz457473 - GFS ignore the noatime and nodiratime mount options" gfs-kernel and mount.gfs2: GFS ignore the noatime and nodiratime mount options gfs-kernel: bz 458765 - In linux-2.6.26 / 2.03.06, GFS1 can't create more than 4kb file gfs-kernel: bug 450209 - addendum to previous patch. Removes extraneous lock_dlm_plock.c Andrew Price (1): [GFS2] libgfs2: Build with -fPIC Bob Peterson (7): GFS2: Make gfs2_fsck accept UNLINKED metadata blocks GFS2: sync buffers to disk when rewriting superblock Changes needed to stay compatible with libvolume_id. Changes needed to stay current with libvolume_id. GFS2: gfs2_fsck: fix segfault while running special block lists. GFS: gfs_fsck invalid response to question changes the question gfs-kmod: GFS corruption after forced withdraw Chris Feist (2): fence: fixed a fence storm with fence_egenera cman: fixed makefiles to actually install the vmware manpage Christine Caulfield (10): cman: Return quorum state in a STATECHANGE notification cman: Allow a recently left node to join cleanly. cman: initialise key_filename variable. cman: honour the dirty flag on a node we haven't seen before cman: Clean shutdown_con if the controlling process is killed. dlm: add dlm_tcpdump tool dlm: Make dlm_tcpdump compile for RHEL5 too dlm: make dlm_tcpdump cope with length==0 packets dlm: Add timestamp and full cmdline to dlm_tcpdump dlm: Add dlmtop David Teigland (7): groupd: ignore nolock gfs fix fenced: add skip_undefined option fenced: add skip_undefined option fix groupd: send and check version messages fence_tool: new option to delay before join init.d/cman: use fence_tool -m for two node clusters groupd: send and check version messages fix Fabio M. Di Nitto (11): qdisk: allow scan of sysfs to dive into first level symlinks qdisk: fix sysfs path diving rgmanger: fix handling of VIP v6 ccs: deal with xml file format special case cman: fix broken init script fence: update alom description fence: install fence_alom man page build: bump kernel requirement to 2.6.27 [BUILD] Allow users to set default log dir [FENCE] Fix fence_apc_snmp logging fence egenera: fix logging file Jan Friesse (6): fence: Fence agent for VMware ESX cman: Removed old Perl version of VMware fence agent, so new version is built. fence: Fix fence agent for VMware ESX. fence: Fix fence agent for VMware ESX. Fence: Added fence agent for Sun Advanced Lights Out Manager (ALOM) fence: New fence agent for Logical Domains (LDOMs) Lon Hohberger (20): rgmanager: Ancillary fix for rhbz #453000 cman: Fix qdiskd file descriptor leak rgmanager: Make freeze/unfreeze work with central_processing rgmanager: Detect restricted failover domain crash rgmanager: Permit careful restart w/o disturbing services rgmanager: Wait for fence domain join to complete rgmanager: Fix up clusvcadm.8 manual page to show -M option rgmanager: make status poll interval configurable rgmanager: Clean up build rgmanager: Implement enforcement of timeouts on a per-resource basis rgmanager: Make clustat and clusvcadm work faster rgmanager: Resolve hostnames->IPs and back when checking NFS clients cman: Fix broken qdisk main.c patch reverted with scandisk merge cman: Don't let qdiskd update cman if the disk is unavailable cman: show '-d' option in mkqdisk -h and mkqdisk.8 [fence] Make fence_xvmd support reloading of key files on the fly. [rgmanager] Apply patch from Marcelo Azevedo to make migration more robust [rgmanager] Fix live migration option (broken in last commit) group: Allow group_tool ls to be scriptable [fence] Fix fence_xvmd trying to read wrong args from ccs Marek 'marx' Grac (4): [FENCE] Fix #237266 - LPAR/HMC fence agent [FENCE] Fix #460054 - fence_apc fails with pexpect exception [FENCE] Fix #290231 - "Switch (optional)" param does not default to "1" and program fails [RGMANAGER] - Fix #462910 postgres-8.sh and metadata fixes Ryan McCabe (1): cman: Fix typo that caused start-up to fail Ryan O'Hara (5): cman: allow custom xen network bridge scripts groupd: detect dead daemons and remove node from cluster fence_scsi: improve logging for debugging groupd.8: update man page with information about -s option fence_scsi: correctly declare key_list Satoru SATOH (1): fence: Add network interface select option for fence_xvmd Simone Gotti (1): [rgmanager] Fix fuser parsing on later versions of psmisc rohara (2): fence_scsi.pl: check if nodeid is zero scsi_reserve: add restart option ccs/daemon/cnx_mgr.c | 20 +- cman/daemon/ais.c | 3 +- cman/daemon/cmanccs.c | 2 +- cman/daemon/commands.c | 15 +- cman/init.d/cman.in | 32 +- cman/lib/libcman.h | 2 +- cman/man/mkqdisk.8 | 5 +- cman/man/qdisk.5 | 16 + cman/qdisk/disk.c | 3 + cman/qdisk/disk.h | 2 + cman/qdisk/main.c | 83 +++- cman/qdisk/mkqdisk.c | 2 +- cman/qdisk/scandisk.c | 13 +- configure | 9 +- dlm/tests/tcpdump/Makefile | 23 + dlm/tests/tcpdump/README | 21 + dlm/tests/tcpdump/dlm_tcpdump.c | 370 ++++++++++++++ dlm/tests/tcpdump/dlmtop.c | 613 +++++++++++++++++++++++ fence/agents/alom/Makefile | 5 + fence/agents/alom/fence_alom.py | 69 +++ fence/agents/apc/fence_apc.py | 15 +- fence/agents/apc_snmp/fence_apc_snmp.py | 4 +- fence/agents/egenera/fence_egenera.pl | 9 +- fence/agents/ldom/Makefile | 5 + fence/agents/ldom/fence_ldom.py | 101 ++++ fence/agents/lib/fencing.py.py | 54 ++- fence/agents/lpar/fence_lpar.py | 3 +- fence/agents/scsi/fence_scsi.pl | 97 +++-- fence/agents/scsi/scsi_reserve | 55 ++ fence/agents/vmware/fence_vmware.pl | 322 ------------ fence/agents/vmware/fence_vmware.py | 111 ++++ fence/agents/xvm/fence_xvm.c | 2 +- fence/agents/xvm/fence_xvmd.c | 37 ++- fence/agents/xvm/mcast.c | 9 +- fence/agents/xvm/mcast.h | 4 +- fence/agents/xvm/options-ccs.c | 3 + fence/agents/xvm/options.c | 13 + fence/agents/xvm/options.h | 1 + fence/agents/xvm/simple_auth.c | 2 + fence/agents/xvm/xvm.h | 1 + fence/fence_tool/fence_tool.c | 93 ++++- fence/fenced/agent.c | 2 +- fence/fenced/fd.h | 4 + fence/fenced/main.c | 32 ++- fence/man/Makefile | 3 + fence/man/fence_alom.8 | 90 ++++ fence/man/fence_ldom.8 | 114 +++++ fence/man/fence_tool.8 | 7 +- fence/man/fence_vmware.8 | 137 +++++ fence/man/fence_xvmd.8 | 3 + gfs-kernel/src/gfs/Makefile | 7 + gfs-kernel/src/gfs/acl.c | 2 +- gfs-kernel/src/gfs/bits.c | 2 +- gfs-kernel/src/gfs/bmap.c | 2 +- gfs-kernel/src/gfs/dio.c | 2 +- gfs-kernel/src/gfs/dir.c | 2 +- gfs-kernel/src/gfs/eaops.c | 2 +- gfs-kernel/src/gfs/eattr.c | 2 +- gfs-kernel/src/gfs/file.c | 2 +- gfs-kernel/src/gfs/gfs.h | 2 +- gfs-kernel/src/gfs/glock.c | 2 +- gfs-kernel/src/gfs/glock.h | 15 +- gfs-kernel/src/gfs/glops.c | 2 +- gfs-kernel/src/gfs/incore.h | 1 + gfs-kernel/src/gfs/inode.c | 10 +- gfs-kernel/src/gfs/ioctl.c | 2 +- gfs-kernel/src/gfs/lm.c | 8 +- gfs-kernel/src/gfs/lm_interface.h | 278 ++++++++++ gfs-kernel/src/gfs/lock_dlm.h | 182 +++++++ gfs-kernel/src/gfs/lock_dlm_lock.c | 527 +++++++++++++++++++ gfs-kernel/src/gfs/lock_dlm_main.c | 40 ++ gfs-kernel/src/gfs/lock_dlm_mount.c | 279 ++++++++++ gfs-kernel/src/gfs/lock_dlm_sysfs.c | 225 +++++++++ gfs-kernel/src/gfs/lock_dlm_thread.c | 367 ++++++++++++++ gfs-kernel/src/gfs/lock_nolock_main.c | 230 +++++++++ gfs-kernel/src/gfs/locking.c | 180 +++++++ gfs-kernel/src/gfs/log.c | 29 +- gfs-kernel/src/gfs/lops.c | 2 +- gfs-kernel/src/gfs/lvb.c | 2 +- gfs-kernel/src/gfs/main.c | 12 +- gfs-kernel/src/gfs/mount.c | 5 +- gfs-kernel/src/gfs/ondisk.c | 2 +- gfs-kernel/src/gfs/ops_address.c | 36 +- gfs-kernel/src/gfs/ops_dentry.c | 2 +- gfs-kernel/src/gfs/ops_export.c | 2 +- gfs-kernel/src/gfs/ops_file.c | 6 +- gfs-kernel/src/gfs/ops_fstype.c | 2 +- gfs-kernel/src/gfs/ops_inode.c | 16 +- gfs-kernel/src/gfs/ops_super.c | 2 +- gfs-kernel/src/gfs/ops_vm.c | 2 +- gfs-kernel/src/gfs/page.c | 2 +- gfs-kernel/src/gfs/proc.c | 2 +- gfs-kernel/src/gfs/quota.c | 2 +- gfs-kernel/src/gfs/recovery.c | 2 +- gfs-kernel/src/gfs/rgrp.c | 2 +- gfs-kernel/src/gfs/super.c | 2 +- gfs-kernel/src/gfs/sys.c | 2 +- gfs-kernel/src/gfs/trans.c | 2 +- gfs-kernel/src/gfs/unlinked.c | 2 +- gfs-kernel/src/gfs/util.c | 2 +- gfs/gfs_fsck/log.c | 10 +- gfs/gfs_mkfs/main.c | 28 +- gfs2/fsck/pass1b.c | 4 +- gfs2/fsck/pass1c.c | 4 +- gfs2/fsck/pass5.c | 14 +- gfs2/libgfs2/Makefile | 1 + gfs2/libgfs2/buf.c | 1 + gfs2/libgfs2/misc.c | 2 +- gfs2/mkfs/main_mkfs.c | 30 +- gfs2/mount/mount.gfs2.c | 1 + gfs2/mount/util.c | 7 + group/daemon/cman.c | 4 + group/daemon/cpg.c | 104 ++++- group/daemon/gd_internal.h | 5 +- group/daemon/main.c | 47 ++- group/man/groupd.8 | 5 + group/tool/main.c | 20 +- make/defines.mk.input | 2 + make/fencebuild.mk | 1 + rgmanager/include/members.h | 3 + rgmanager/include/resgroup.h | 9 +- rgmanager/include/reslist.h | 3 +- rgmanager/man/clurgmgrd.8 | 13 +- rgmanager/man/clusvcadm.8 | 66 ++- rgmanager/src/clulib/members.c | 29 ++ rgmanager/src/clulib/rg_strings.c | 23 +- rgmanager/src/daemons/clurmtabd.c | 4 +- rgmanager/src/daemons/event_config.c | 8 + rgmanager/src/daemons/fo_domain.c | 23 +- rgmanager/src/daemons/groups.c | 123 ++++-- rgmanager/src/daemons/main.c | 55 ++- rgmanager/src/daemons/reslist.c | 7 +- rgmanager/src/daemons/restree.c | 101 ++++- rgmanager/src/daemons/rg_event.c | 58 ++- rgmanager/src/daemons/rg_forward.c | 6 +- rgmanager/src/daemons/rg_locks.c | 3 +- rgmanager/src/daemons/rg_state.c | 51 ++- rgmanager/src/daemons/rg_thread.c | 3 +- rgmanager/src/daemons/service_op.c | 13 +- rgmanager/src/daemons/slang_event.c | 52 ++- rgmanager/src/daemons/test.c | 3 +- rgmanager/src/resources/clusterfs.sh | 4 +- rgmanager/src/resources/default_event_script.sl | 16 +- rgmanager/src/resources/fs.sh | 4 +- rgmanager/src/resources/ip.sh | 6 +- rgmanager/src/resources/netfs.sh | 4 +- rgmanager/src/resources/nfsclient.sh | 94 ++++- rgmanager/src/resources/postgres-8.metadata | 2 +- rgmanager/src/resources/postgres-8.sh | 16 +- rgmanager/src/resources/service.sh | 21 + rgmanager/src/resources/utils/ra-skelet.sh | 5 + rgmanager/src/resources/vm.sh | 20 +- 152 files changed, 5562 insertions(+), 730 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSP2G3AgUGcMLQ3qJAQJqPQ//bgQo9WmIXcAsHJMHpb5qEUemFlIOXfXQ YkeJM2agZiJK+/WMA2SrhI2FjPg5Q09Cxxi0KQH0F1XVKQiRuTwA2sGf9CJqLYps HhZ2pOlN01ixNCJgfbRrxIqvvnel2nnRmlEpdU4FgHiSgJmFoEvao+Oy8dOVt+/s b2zB8niYXWXsgC+Zx9QH9OmWygsf68pGozwnZ0UBOJluXcVZUdsfKn0WMvYSBTfP fBObpwfK0F3Gpko3747tPYQEyFz6vrsrK2GVqivPuhCTP7ZLSsrUCg8Q9CFWCyA0 El+cBjXBgXRGQsIiWz6bbnnjeos/vM1N7fV9KSqoAxljrb6peSyT8SpBDVTdebF4 6IZNdxrxRPtPMgDjz3wnHwbhF8HhPAcQgWIqHdOBPvEFFkYaSF+1m0WFJHgIdWma zZ6OWiNf+H5SNPZ9t9F0UBZAUbciugORfWUvhPOYJNk4HMSjuxAMCjjBzfNWkIed G8XK8Xtq8g3aNv3CvD54Jl9NGZjQTwJFMwNu2u4RmXYH0L+PgF7fOjfD7P+0WEEB E9uIzCqYv0svvPVCbLVXfk2qdJ2u2veW7REEvZSg2BT1bj4uS+sK7Tv3FK7aaoAx CFOGb2Y6I4vqaJbunPTVWCyVsubtvQJSMBqRMJBhCXKE8o/YLWoyvUrAB+PB2j1d er9d1M5+23g= =ixnW -----END PGP SIGNATURE----- From nick at javacat.f2s.com Tue Oct 21 07:55:41 2008 From: nick at javacat.f2s.com (nick at javacat.f2s.com) Date: Tue, 21 Oct 2008 08:55:41 +0100 Subject: [Linux-cluster] 4 node GFS cluster sanity check Message-ID: <1224575741.48fd8afd581cd@webmail.freedom2surf.net> Hi, RHEL 5.2 32bit kernel 2.6.18-92.1.10.el5PAE kmod-gfs-0.1.23-5.el5 gfs2-utils-0.1.44-1.el5_2.1 gfs-utils-0.1.17-1.el5 cman-2.0.84-2.el5 kmod-gfs2-PAE-1.92-1.1.el5 kmod-gfs2-1.92-1.1.el5 kmod-gfs-PAE-0.1.23-5.el5 rgmanager-2.0.38-2.el5 I have a 4 node cluster. All I want to use is GFS so that each node can read/write to the same directory. I don't want failover. I want to enable as few cluster daemons as possible. Here is my cluster.conf Here' is the output of cman_tool services: type level name id state fence 0 default 00010001 none [1 3 4] dlm 1 clvmd 00020001 none [1 3 4] dlm 1 GFS1 00040001 none [1 4] dlm 1 rgmanager 00010003 none [1 3 4] Here is the output of cman_tool status: Version: 6.1.0 Config Version: 14 Cluster Name: TEST Cluster Id: 1198 Cluster Member: Yes Cluster Generation: 496 Membership state: Cluster-Member Nodes: 7 Expected votes: 6 Total votes: 4 Quorum: 4 Active subsystems: 9 Flags: Dirty Ports Bound: 0 11 177 Node name: fintestapp4 Node ID: 4 Multicast addresses: 239.192.4.178 Node addresses: 192.168.10.68 As you can see Expected votes is 6 while Total votes is 4 - whats wrong here ? I would like confirmation that my cluster.conf is adequate please because after a few reboots last week expected votes and total votes give unexpected results. If any more info is needed, please ask. Many thanks, Nick . From federico.simoncelli at gmail.com Tue Oct 21 11:13:56 2008 From: federico.simoncelli at gmail.com (Federico Simoncelli) Date: Tue, 21 Oct 2008 13:13:56 +0200 Subject: [Linux-cluster] Avoiding fencing loops with startup quorum (patch) In-Reply-To: References: <20081017154142.GE3299@redhat.com> <20081017160923.GG3299@redhat.com> Message-ID: On Mon, Oct 20, 2008 at 1:27 PM, Federico Simoncelli wrote: > I confirm that the patch works fine. I just need to say that the > two_node flag is required anyway: > > After some testing I discovered that we need a couple of patches to achieve the behaviour we wanted. Basically if you set two_node=1 the quorum is locked to 1 (no matter what expected_votes you configure). I unlocked the quorum value with the patch "cman-2.0.84-2node2expected.patch" (in attachment). Now we can change the quorum using the expected_votes: # cman_tool expected -e 2 && cman_tool status | grep Quorum Quorum: 2 # cman_tool expected -e 1 && cman_tool status | grep Quorum Quorum: 1 In the same patch I fixed what I believe is a bug. Basically in the file /cman/daemon/cmanccs.c the values node_count and vote_sum are computed only if expected_votes == 0 but those values are used afterwards: if (two_node) { if (node_count != 2 || vote_sum != 2) { To quickly verify the bug: # cman_tool join -w -e 1 It should generate the error "two_node set but there are more than 2 nodes" on any cman version. The second patch "cman-2.0.84-startupquorum.patch" is the init patch to take advantage of the expected_votes and the quorum. Using the following configuration: # cat /etc/sysconfig/cman CMAN_QUORUM_TIMEOUT=10 CMAN_PREJOIN_EXPECTED=2 CMAN_POSTJOIN_EXPECTED=1 Your cluster needs the quorum to be 2 (CMAN_PREJOIN_EXPECTED) within 10 seconds to start. No fencing (and no fencing loops) will be performed if the quorum is less than CMAN_PREJOIN_EXPECTED. After the join session the expected_votes are set back to 1 (CMAN_POSTJOIN_EXPECTED) and the quorum goes back to 1 too. Comments are welcome, -- Federico. -------------- next part -------------- A non-text attachment was scrubbed... Name: cman-2.0.84-2node2expected.patch Type: application/octet-stream Size: 1135 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cman-2.0.84-startupquorum.patch Type: application/octet-stream Size: 1086 bytes Desc: not available URL: From scooter at cgl.ucsf.edu Tue Oct 21 13:55:06 2008 From: scooter at cgl.ucsf.edu (Scooter Morris) Date: Tue, 21 Oct 2008 06:55:06 -0700 Subject: [Linux-cluster] GFS2 Test setup In-Reply-To: <48FD854D.7030409@redhat.com> References: <48FCBCBC.50801@cgl.ucsf.edu> <48FD854D.7030409@redhat.com> Message-ID: <48FDDF3A.1000102@cgl.ucsf.edu> Christine, Thanks for the information. I checked my routing, and other than the zero conf route on the same interface as my private network, everything seems clean. I moved the zero conf route to the public network, so we'll see if that fixes anything. Also, the multicast route doesn't get involved, does it? The default route is on our public network (obviously) and the nodes should be talking to each other over the private network (according to cman_tool status), but I don't know what interface the multicasts will be sent out from. I wouldn't think that would impact dlm, only the heartbeat, right? Thanks again! -- scooter Christine Caulfield wrote: > Scooter Morris wrote: > >> We are in the process of building a cluster, which will hope to put into >> production when RHEL 5.3 is released. Our plan is to use GFS2, which >> we've been experimenting with for some time, but we're having some >> problems. The cluster has 3 nodes, two HP DL580's and one HP DL585 -- >> we're using ILO for fencing. We want to share a couple of filesystems >> using GFS2 which are connected to our SAN (an EVA 5000). I've set >> everything up and it all works as expected, although on occasion, GFS2 >> just seems to hang. This happens 1-4 times/week. What I note in the >> logs are a series of dlm messages. On node 1 (for example) I see: >> >> dlm: connecting to 3 >> dlm: connecting to 2 >> dlm: connecting to 2 >> dlm: connecting to 2 >> dlm: connecting to 3 >> dlm: connecting to 2 >> dlm: connecting to 2 >> dlm: connecting to 2 >> dlm: connecting to 3 >> dlm: connecting to 3 >> dlm: connecting to 3 >> dlm: connecting to 3 >> >> On node 2, I see: >> >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> >> and on node 3, I see: >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> dlm: got connection from 1 >> Extra connection from node 1 attempted >> >> > > Those messages are usually caused by routing problems. The DLM binds to > the address it is given by cman (see the output of cman_tool status for > that) and receiving nodes check incoming packets against that address to > make sure that only valid cluster nodes try to make connections. > > What is happening here (I think - it sounds like a problem I've seen > before) is that the packets are being routed though another interface > than the one cman is using and the remote node sees them as coming from > a different address. This can happen if you have two ethernet interfaces > connected to the same physical segment for example. > > There was a also a bug that could cause this if the routing was not > quite so broken but a little odd, though I don't have the bugzilla > number to hand, sorry. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From max.liccardo at gmail.com Wed Oct 22 21:50:46 2008 From: max.liccardo at gmail.com (max liccardo) Date: Wed, 22 Oct 2008 23:50:46 +0200 Subject: [Linux-cluster] ipfails Message-ID: hi cluster masters, I'm using linux-HA and linux-cluster on separate project. I'm wondering if I can use with linux-cluster something like the linux-ha ping nodes, in order to have some sort of "network quorum". bye GnuPG public key available on wwwkeys.eu.pgp.net Key ID: D01F1CAD Key fingerprint: 992D 91B7 9682 9735 12C9 402D AD3F E4BB D01F 1CAD "la velocit? induce all'oblio, la lentezza al ricordo" From jallgood at ohl.com Thu Oct 23 14:08:21 2008 From: jallgood at ohl.com (Allgood, John) Date: Thu, 23 Oct 2008 09:08:21 -0500 Subject: [Linux-cluster] Cluster/GFS issue. Message-ID: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> Hello All I am having some issues with building an eight node Xen cluster. Let me give some background first. We have 8 dell PE 1950 with 32GB RAM connected via dual brocade fiber switchs to an EMC CX-310. The guests images are being stored on the SAN. We are using EMC Powerpath to hand the multipathing. The Operating system is Redhat Advanced Platform 5.2 . The filesystems on the SAN were created using Conga CLVM/GFS1. We have the heartbeat on an separate private network. The fence devices are Dell DRAC's. Here is the problem that we are having. We can't on an consistent basic get the GFS filesystem mounted. On the nodes that don't connect it will just hang on bootup trying to mount the GFS filesystem. All nodes come up and join the cluster at this point but only 1 or 2 will completely come up with the GFS filesystem mounted. If we do an interactive startup and skip the GFS part all systems will come up on the cluster but without the gfs mounted. At this point I am not sure what to do next. I am thinking it may be a problem with the way the GFS filesystem was created. We just used the default settings. The LVM is 668GB created from an RAID10. Best Regards John Allgood Senior Systems Administrator Turbo, division of OHL 2251 Jesse Jewell Pky. NE Gainesville, GA 30507 tel: (678) 989-3051 fax: (770) 531-7878 jallgood at ohl.com www.ohl.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Thu Oct 23 15:49:59 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 23 Oct 2008 11:49:59 -0400 Subject: [Linux-cluster] Email alert In-Reply-To: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> References: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> Message-ID: <1224776999.32460.79.camel@ayanami> On Fri, 2008-10-17 at 01:03 -0300, Patricio A. Bruna wrote: > Its possible to configure Cluster Suite to send an email when a > service change host or faild to failover? Syslog-ng can be configured to do this. -- Lon > From lhh at redhat.com Thu Oct 23 15:51:16 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 23 Oct 2008 11:51:16 -0400 Subject: [Linux-cluster] Email alert In-Reply-To: <48F894C4.9010506@nexatech.com> References: <12438484.3551224216194223.JavaMail.root@lisa.itlinux.cl> <028901c93017$e230f490$a692ddb0$@com> <48F894C4.9010506@nexatech.com> Message-ID: <1224777076.32460.82.camel@ayanami> On Fri, 2008-10-17 at 08:36 -0500, Jeff Macfarland wrote: > Mark Chaney wrote: > > No, but you can use a monitoring service like nagios to do that. > > Is "RIND" (http://sources.redhat.com/cluster/wiki/EventScripting) not > applicable? Or, if implemented in , will it prevent the system from > automated failover of services, etc? I dunno much about slang, but it > looks like it at least supports system() for a quick email if nothing else. Fork/exec during failover needs to be managed carefully. I haven't tried system() from within a RIND script. It should probably work (or, maybe we should provide an email interface). -- Lon From lhh at redhat.com Thu Oct 23 15:55:41 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 23 Oct 2008 11:55:41 -0400 Subject: [Linux-cluster] 4 node GFS cluster sanity check In-Reply-To: <1224575741.48fd8afd581cd@webmail.freedom2surf.net> References: <1224575741.48fd8afd581cd@webmail.freedom2surf.net> Message-ID: <1224777341.32460.85.camel@ayanami> On Tue, 2008-10-21 at 08:55 +0100, nick at javacat.f2s.com wrote: > Hi, > > RHEL 5.2 32bit kernel 2.6.18-92.1.10.el5PAE > Here is the output of cman_tool status: > Version: 6.1.0 > Config Version: 14 > Cluster Name: TEST > Cluster Id: 1198 > Cluster Member: Yes > Cluster Generation: 496 > Membership state: Cluster-Member > Nodes: 7 > Expected votes: 6 > Total votes: 4 > Quorum: 4 > Active subsystems: 9 > Flags: Dirty > Ports Bound: 0 11 177 > Node name: fintestapp4 > Node ID: 4 > Multicast addresses: 239.192.4.178 > Node addresses: 192.168.10.68 > > As you can see Expected votes is 6 while Total votes is 4 - whats wrong here ? > > I would like confirmation that my cluster.conf is adequate please because after a few reboots last week expected votes and total votes give unexpected > results. > > If any more info is needed, please ask. ? How can you have 7 nodes, expected 6 with a 4-node cluster configuration. It looks like you have two clusters with the same name on the same subnet. Also, you should chkconfig --del rgmanager if you're not doing failover. You don't need it. -- Lon From lhh at redhat.com Thu Oct 23 15:56:49 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 23 Oct 2008 11:56:49 -0400 Subject: [Linux-cluster] ipfails In-Reply-To: References: Message-ID: <1224777409.32460.87.camel@ayanami> On Wed, 2008-10-22 at 23:50 +0200, max liccardo wrote: > hi cluster masters, > I'm using linux-HA and linux-cluster on separate project. > I'm wondering if I can use with linux-cluster something like the > linux-ha ping nodes, in order to have some sort of "network quorum". > bye Currently, no, but you could build a daemon which did this and talked to the CMAN quorum API to do this. -- Lon From lhh at redhat.com Thu Oct 23 16:01:57 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 23 Oct 2008 12:01:57 -0400 Subject: [Linux-cluster] Two nodes cluster issue without shared storage issue In-Reply-To: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net> References: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net> Message-ID: <1224777717.32460.92.camel@ayanami> On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) wrote: > Hi, > > I want to set up a two node cluster, I use active/standby mode to run > my service. I need even one node's hardware failure such as power cut, > another node still can handover from failure node and the provide the > service. > > In my environment, I have no shared storage, so I can not use quorum > disk. Is there any other way to implement it? I searched and found > 'tiebreaker IP' may feed my request, but I can not found any hints on > how to configure it ? Since you have no shared data, you may be able to run without fencing. That should be pretty straightforward, but you might need to comment out the "fenced" startup from the cman init script. In this case, the worst that will happen is both nodes will end up running the service at the same time in the event of a network partition. The other down side is that if the cluster divides into two partitions and later merges back into one partition, I don't think certain things will work right; you will need to detect this event and reboot one of the nodes. -- Lon From billpp at gmail.com Thu Oct 23 16:42:40 2008 From: billpp at gmail.com (Flavio Junior) Date: Thu, 23 Oct 2008 14:42:40 -0200 Subject: [Linux-cluster] Two nodes cluster issue without shared storage issue In-Reply-To: <1224777717.32460.92.camel@ayanami> References: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net> <1224777717.32460.92.camel@ayanami> Message-ID: <58aa8d780810230942s421d74dfqaf61190be764b57@mail.gmail.com> Well.. If you are using an active/standby scenario, without a shared storage, probably you can make use of CARP/UCARP http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol http://www.ucarp.org/project/ucarp -- Fl?vio do Carmo J?nior aka waKKu On Thu, Oct 23, 2008 at 2:01 PM, Lon Hohberger wrote: > On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) > wrote: > > Hi, > > > > I want to set up a two node cluster, I use active/standby mode to run > > my service. I need even one node's hardware failure such as power cut, > > another node still can handover from failure node and the provide the > > service. > > > > In my environment, I have no shared storage, so I can not use quorum > > disk. Is there any other way to implement it? I searched and found > > 'tiebreaker IP' may feed my request, but I can not found any hints on > > how to configure it ? > > Since you have no shared data, you may be able to run without fencing. > > That should be pretty straightforward, but you might need to comment out > the "fenced" startup from the cman init script. > > In this case, the worst that will happen is both nodes will end up > running the service at the same time in the event of a network > partition. > > The other down side is that if the cluster divides into two partitions > and later merges back into one partition, I don't think certain things > will work right; you will need to detect this event and reboot one of > the nodes. > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mockey.chen at nsn.com Fri Oct 24 02:35:48 2008 From: mockey.chen at nsn.com (Chen, Mockey (NSN - CN/Cheng Du)) Date: Fri, 24 Oct 2008 10:35:48 +0800 Subject: [Linux-cluster] Two nodes cluster issue without shared storageissue In-Reply-To: <1224777717.32460.92.camel@ayanami> References: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net> <1224777717.32460.92.camel@ayanami> Message-ID: <174CED94DD8DC54AB888B56E103B118742183A@CNBEEXC007.nsn-intra.net> >-----Original Message----- >From: linux-cluster-bounces at redhat.com >[mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Lon >Hohberger >Sent: 2008?10?24? 0:02 >To: linux clustering >Subject: Re: [Linux-cluster] Two nodes cluster issue without >shared storageissue > >On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) >wrote: >> Hi, >> >> I want to set up a two node cluster, I use active/standby >mode to run >> my service. I need even one node's hardware failure such as >power cut, >> another node still can handover from failure node and the >provide the >> service. >> >> In my environment, I have no shared storage, so I can not use quorum >> disk. Is there any other way to implement it? I searched and found >> 'tiebreaker IP' may feed my request, but I can not found any >hints on >> how to configure it ? > >Since you have no shared data, you may be able to run without fencing. > >That should be pretty straightforward, but you might need to >comment out the "fenced" startup from the cman init script. > >In this case, the worst that will happen is both nodes will >end up running the service at the same time in the event of a >network partition. > >The other down side is that if the cluster divides into two >partitions and later merges back into one partition, I don't >think certain things will work right; you will need to detect >this event and reboot one of the nodes. > >-- Lon I know such defects in two node cluster. Since our service is mission critical, I want to know how to avoid such failure case ? Thanks. From mockey.chen at nsn.com Fri Oct 24 02:41:11 2008 From: mockey.chen at nsn.com (Chen, Mockey (NSN - CN/Cheng Du)) Date: Fri, 24 Oct 2008 10:41:11 +0800 Subject: [Linux-cluster] Two nodes cluster issue without shared storageissue In-Reply-To: <58aa8d780810230942s421d74dfqaf61190be764b57@mail.gmail.com> References: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net><1224777717.32460.92.camel@ayanami> <58aa8d780810230942s421d74dfqaf61190be764b57@mail.gmail.com> Message-ID: <174CED94DD8DC54AB888B56E103B118742183D@CNBEEXC007.nsn-intra.net> > > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Flavio Junior > Sent: 2008?10?24? 0:43 > To: linux clustering > Subject: Re: [Linux-cluster] Two nodes cluster issue without shared storageissue > > > Well.. If you are using an active/standby scenario, without a shared storage, probably you can make use of CARP/UCARP > > http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol > http://www.ucarp.org/project/ucarp > I think CARP will fullfill my current request, but we have choose RHCS as our cluster suite. It is very difficult to change it. Anyhow, Thanks for your suggestion. From raju.rajsand at gmail.com Fri Oct 24 07:11:33 2008 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Fri, 24 Oct 2008 12:41:33 +0530 Subject: [Linux-cluster] Cluster/GFS issue. In-Reply-To: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> Message-ID: <8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com> Greetings, 2008/10/23 Allgood, John > Here is the problem that we are having. We can't on an consistent basic > get the GFS filesystem mounted. On > Just a hunch... Cant say if it will help... Have you tried putting the mount command in rc.local instead of /etc/fstab? Regards, Rajagopal -------------- next part -------------- An HTML attachment was scrubbed... URL: From raju.rajsand at gmail.com Fri Oct 24 07:14:16 2008 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Fri, 24 Oct 2008 12:44:16 +0530 Subject: [Linux-cluster] Cluster/GFS issue. In-Reply-To: <8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> <8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com> Message-ID: <8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c@mail.gmail.com> > > 2008/10/23 Allgood, John > > >> Here is the problem that we are having. We can't on an consistent basic >> get the GFS filesystem mounted. On >> > > > Just a hunch... Cant say if it will help... > > Have you tried putting the mount command in rc.local instead of /etc/fstab? > start clvmd too in rc.local. of course before mounting and use the commands in chain using && > > Regards, > > Rajagopal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Santosh.Panigrahi at in.unisys.com Fri Oct 24 11:59:05 2008 From: Santosh.Panigrahi at in.unisys.com (Panigrahi, Santosh Kumar) Date: Fri, 24 Oct 2008 17:29:05 +0530 Subject: [Linux-cluster] cluster between 2 Xen guests where guests are on different hosts In-Reply-To: <476A18A2.2080406@wasko.pl> References: <45824.79.10.137.147.1197661952.squirrel@picard.linux.it> <1197672129.18614.2.camel@localhost.localdomain> <26275.62.101.100.5.1197887615.squirrel@picard.linux.it><1197915660.4959.24.camel@ayanami.boston.devel.redhat.com> <476A18A2.2080406@wasko.pl> Message-ID: Hello, I am using RHEL5.2+RHCS and configured a 2 node cluster in XEN virtual environment for testing purpose only. These 2 cluster nodes are 2 virtual guests (p6pv1, p7pv1) and each virtual guest is on different hosts/ Dom-0s (p6 & p7). I have already gone through the older questions on this forum with similar problems and also the wiki page (http://sources.redhat.com/cluster/wiki/VMClusterCookbook ). But still I have confused a bit regarding the Xen fencing in this scenarios. I don't want to do any live migration here and only to do a failover/failback services between 2 cluster nodes. I want to know whether I have to configure fencing only between the 2 guests (using fence_xvm) or also between the 2 hosts (using fence_xvmd) as well, where as my cluster nodes are 2 Xen guests. I am configuring the cluster using luci and there options are as follows. Fence Daemon Properties: Post Fail Delay - 0 Post Join Delay - 3 Run XVM fence daemon - tick mark selected XVM fence daemon key distribution: Enter a node hostname from the host cluster - ? Enter a node hostname from the hosted (virtual) cluster _ ? Can someone please help me in this regard? Regards, Santosh From jeff.sturm at eprize.com Fri Oct 24 14:09:57 2008 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Fri, 24 Oct 2008 10:09:57 -0400 Subject: [Linux-cluster] cluster between 2 Xen guests where guests are ondifferent hosts In-Reply-To: References: <45824.79.10.137.147.1197661952.squirrel@picard.linux.it> <1197672129.18614.2.camel@localhost.localdomain> <26275.62.101.100.5.1197887615.squirrel@picard.linux.it><1197915660.4959.24.camel@ayanami.boston.devel.redhat.com><476A18A2.2080406@wasko.pl> Message-ID: <64D0546C5EBBD147B75DE133D798665F0180693B@hugo.eprize.local> Santosh, The hosts are responsible for fencing the guests, so, as far as I know it is not possible to use fence_xvm without also configuring fence_xvmd. In our configuration we run an "inner" cluster amongst the DomU guests, and an "outer" cluster amongst the Dom0 hosts. The outer cluster starts fence_xvmd whenever cman starts. The fence_xvmd daemon listens for multicast traffic from fence_xvm. We have a dedicated VLAN for this traffic in our configuration. (Make sure your routing tables are adjusted for this, if needed--whereas aisexec figures out what interfaces to use for multicast automatically based on the bind address, fence_xvm does not.) If your Dom0 hosts are not part of a cluster, it may be possible to run fence_xvmd standalone. We have not attempted to do so, so I can't say whether it can work. Jeff > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > Panigrahi, Santosh Kumar > Sent: Friday, October 24, 2008 7:59 AM > To: linux clustering > Subject: [Linux-cluster] cluster between 2 Xen guests where > guests are ondifferent hosts > > Hello, > > I am using RHEL5.2+RHCS and configured a 2 node cluster in > XEN virtual environment for testing purpose only. These 2 > cluster nodes are 2 virtual guests (p6pv1, p7pv1) and each > virtual guest is on different hosts/ Dom-0s (p6 & p7). I have > already gone through the older questions on this forum with > similar problems and also the wiki page > (http://sources.redhat.com/cluster/wiki/VMClusterCookbook ). > But still I have confused a bit regarding the Xen fencing in > this scenarios. > I don't want to do any live migration here and only to do a > failover/failback services between 2 cluster nodes. I want to > know whether I have to configure fencing only between the 2 > guests (using > fence_xvm) or also between the 2 hosts (using fence_xvmd) as > well, where as my cluster nodes are 2 Xen guests. > > I am configuring the cluster using luci and there options are > as follows. > > Fence Daemon Properties: > Post Fail Delay - 0 > Post Join Delay - 3 > Run XVM fence daemon - tick mark selected > > XVM fence daemon key distribution: > Enter a node hostname from the host cluster - ? > Enter a node hostname from the hosted (virtual) cluster _ ? > > Can someone please help me in this regard? > > Regards, > Santosh > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > From cedwards at smartechcorp.net Fri Oct 24 14:13:07 2008 From: cedwards at smartechcorp.net (Chris Edwards) Date: Fri, 24 Oct 2008 10:13:07 -0400 Subject: [Linux-cluster] Cluster and LVG/LV In-Reply-To: <8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c@mail.gmail.com> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> <8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com> <8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c@mail.gmail.com> Message-ID: <61252CC53A97634BA52256DCF2344FBC66C68DE2FF@OFFICEEXCHANGE.office.smartechcorp.net> If I am installing multiple Xen VM's in a cluster with shared iSCSI space with Logical Volumes for each virtual machine should I put each LV in its own logical volume group or should I use one logical volume group for all of the LV's? Thanks! --- Chris Edwards -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff.sturm at eprize.com Fri Oct 24 14:18:08 2008 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Fri, 24 Oct 2008 10:18:08 -0400 Subject: [Linux-cluster] Two nodes cluster issue without sharedstorageissue In-Reply-To: <174CED94DD8DC54AB888B56E103B118742183A@CNBEEXC007.nsn-intra.net> References: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net><1224777717.32460.92.camel@ayanami> <174CED94DD8DC54AB888B56E103B118742183A@CNBEEXC007.nsn-intra.net> Message-ID: <64D0546C5EBBD147B75DE133D798665F0180693C@hugo.eprize.local> For what it's worth, considerations like these have caused us to abandon any efforts to build a 2-node cluster. >From this point forward all our RHCS deployments will have a minimum of 3 nodes, even if the 3rd node is a small node that provides no resources and only exists for arbitration purposes. (It was going to be that, or a quorum disk for our application, but we have no experience running a quorum disk over the long-haul in a production envrironment.) Hope this helps someone. > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chen, > Mockey (NSN - CN/Cheng Du) > Sent: Thursday, October 23, 2008 10:36 PM > To: linux clustering > Subject: RE: [Linux-cluster] Two nodes cluster issue without > sharedstorageissue > > > > >-----Original Message----- > >From: linux-cluster-bounces at redhat.com > >[mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Lon > >Hohberger > >Sent: 2008?10?24? 0:02 > >To: linux clustering > >Subject: Re: [Linux-cluster] Two nodes cluster issue without shared > >storageissue > > > >On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) > >wrote: > >> Hi, > >> > >> I want to set up a two node cluster, I use active/standby > >mode to run > >> my service. I need even one node's hardware failure such as > >power cut, > >> another node still can handover from failure node and the > >provide the > >> service. > >> > >> In my environment, I have no shared storage, so I can not > use quorum > >> disk. Is there any other way to implement it? I searched and found > >> 'tiebreaker IP' may feed my request, but I can not found any > >hints on > >> how to configure it ? > > > >Since you have no shared data, you may be able to run > without fencing. > > > >That should be pretty straightforward, but you might need to comment > >out the "fenced" startup from the cman init script. > > > >In this case, the worst that will happen is both nodes will end up > >running the service at the same time in the event of a network > >partition. > > > >The other down side is that if the cluster divides into two > partitions > >and later merges back into one partition, I don't think > certain things > >will work right; you will need to detect this event and > reboot one of > >the nodes. > > > >-- Lon > > I know such defects in two node cluster. > Since our service is mission critical, I want to know how to > avoid such failure case ? > > Thanks. > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > From jeff.sturm at eprize.com Fri Oct 24 14:20:04 2008 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Fri, 24 Oct 2008 10:20:04 -0400 Subject: [Linux-cluster] Cluster and LVG/LV In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66C68DE2FF@OFFICEEXCHANGE.office.smartechcorp.net> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com><8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com><8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c@mail.gmail.com> <61252CC53A97634BA52256DCF2344FBC66C68DE2FF@OFFICEEXCHANGE.office.smartechcorp.net> Message-ID: <64D0546C5EBBD147B75DE133D798665F0180693D@hugo.eprize.local> Chris, Are you running a clustered LVM, and do you expect to be able to use Xen migration? Jeff ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards Sent: Friday, October 24, 2008 10:13 AM To: linux clustering Subject: [Linux-cluster] Cluster and LVG/LV If I am installing multiple Xen VM's in a cluster with shared iSCSI space with Logical Volumes for each virtual machine should I put each LV in its own logical volume group or should I use one logical volume group for all of the LV's? Thanks! --- Chris Edwards -------------- next part -------------- An HTML attachment was scrubbed... URL: From cedwards at smartechcorp.net Fri Oct 24 14:28:35 2008 From: cedwards at smartechcorp.net (Chris Edwards) Date: Fri, 24 Oct 2008 10:28:35 -0400 Subject: [Linux-cluster] Cluster and LVG/LV In-Reply-To: <64D0546C5EBBD147B75DE133D798665F0180693D@hugo.eprize.local> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com><8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com><8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c@mail.gmail.com> <61252CC53A97634BA52256DCF2344FBC66C68DE2FF@OFFICEEXCHANGE.office.smartechcorp.net> <64D0546C5EBBD147B75DE133D798665F0180693D@hugo.eprize.local> Message-ID: <61252CC53A97634BA52256DCF2344FBC66C68DE303@OFFICEEXCHANGE.office.smartechcorp.net> Yes to both. Right now the cluster is running GFS and I can migrate VM's between the nodes. This question is coming up because I have been trying to do a snap shot and I realized the snapshot is stored on the Volume Group that the LV is located on. I did not realize this and I cannot do a snapshot because I did not leave enough space in each of the Volume Groups for each of the VM's. --- Chris Edwards Smartech Corp. Div. of AirNet Group http://www.airnetgroup.com http://www.smartechcorp.net cedwards at smartechcorp.net P: 423-664-7678 x114 C: 423-593-6964 F: 423-664-7680 From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Sturm Sent: Friday, October 24, 2008 10:20 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Chris, Are you running a clustered LVM, and do you expect to be able to use Xen migration? Jeff ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards Sent: Friday, October 24, 2008 10:13 AM To: linux clustering Subject: [Linux-cluster] Cluster and LVG/LV If I am installing multiple Xen VM's in a cluster with shared iSCSI space with Logical Volumes for each virtual machine should I put each LV in its own logical volume group or should I use one logical volume group for all of the LV's? Thanks! --- Chris Edwards -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodrique.heron at baruch.cuny.edu Fri Oct 24 14:33:07 2008 From: rodrique.heron at baruch.cuny.edu (Rodrique Heron) Date: Fri, 24 Oct 2008 10:33:07 -0400 Subject: [Linux-cluster] Cluster and LVG/LV In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66C68DE303@OFFICEEXCHANGE.office.smartechcorp.net> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com><8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com><8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c@mail.gmail.com><61252CC53A97634BA52256DCF2344FBC66C68DE2FF@OFFICEEXCHANGE.office.smartechcorp.net><64D0546C5EBBD147B75DE133D798665F0180693D@hugo.eprize.local> <61252CC53A97634BA52256DCF2344FBC66C68DE303@OFFICEEXCHANGE.office.smartechcorp.net> Message-ID: <4901DCA3.3050501@baruch.cuny.edu> I am would be interested what others have to say as well, but I have one VG that I carved a LV from for each VM. Chris Edwards wrote: > > Yes to both. Right now the cluster is running GFS and I can migrate > VM's between the nodes. > > > > This question is coming up because I have been trying to do a snap > shot and I realized the snapshot is stored on the Volume Group that > the LV is located on. I did not realize this and I cannot do a > snapshot because I did not leave enough space in each of the Volume > Groups for each of the VM's. > > > > --- > > > > Chris Edwards > Smartech Corp. > Div. of AirNet Group > > http://www.airnetgroup.com > > http://www.smartechcorp.net > > cedwards at smartechcorp.net > P: 423-664-7678 x114 > > C: 423-593-6964 > > F: 423-664-7680 > > > > *From:* linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Jeff Sturm > *Sent:* Friday, October 24, 2008 10:20 AM > *To:* linux clustering > *Subject:* RE: [Linux-cluster] Cluster and LVG/LV > > > > Chris, > > > > Are you running a clustered LVM, and do you expect to be able to use > Xen migration? > > > > Jeff > > > > ------------------------------------------------------------------------ > > *From:* linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Chris Edwards > *Sent:* Friday, October 24, 2008 10:13 AM > *To:* linux clustering > *Subject:* [Linux-cluster] Cluster and LVG/LV > > If I am installing multiple Xen VM's in a cluster with shared > iSCSI space with Logical Volumes for each virtual machine should I > put each LV in its own logical volume group or should I use one > logical volume group for all of the LV's? > > > > Thanks! > > > > --- > > > > Chris Edwards > > > > > -- Rodrique Heron -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rodrique_heron.vcf Type: text/x-vcard Size: 328 bytes Desc: not available URL: From v.galande at gmail.com Fri Oct 24 14:50:46 2008 From: v.galande at gmail.com (varun) Date: Fri, 24 Oct 2008 18:50:46 +0400 Subject: [Linux-cluster] RE:Two nodes cluster issue without shared storage issue Message-ID: <7e19e5b90810240750n1e5aa2abq8a5af976f1677703@mail.gmail.com> Hi Lon I think you should try Linux Virtual Server ( LVS ) here this will definitely help you. You can see the details over here . www.linuxvirtualserver.org Br,Varun On Fri, Oct 24, 2008 at 6:20 PM, wrote: > Send Linux-cluster mailing list submissions to > linux-cluster at redhat.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/linux-cluster > or, via email, send a message with subject or body 'help' to > linux-cluster-request at redhat.com > > You can reach the person managing the list at > linux-cluster-owner at redhat.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-cluster digest..." > > > Today's Topics: > > 1. Re: Two nodes cluster issue without shared storage issue > (Lon Hohberger) > 2. Re: Two nodes cluster issue without shared storage issue > (Flavio Junior) > 3. RE: Two nodes cluster issue without shared storageissue > (Chen, Mockey (NSN - CN/Cheng Du)) > 4. RE: Two nodes cluster issue without shared storageissue > (Chen, Mockey (NSN - CN/Cheng Du)) > 5. Re: Cluster/GFS issue. (Rajagopal Swaminathan) > 6. Re: Cluster/GFS issue. (Rajagopal Swaminathan) > 7. cluster between 2 Xen guests where guests are on different > hosts (Panigrahi, Santosh Kumar) > 8. RE: cluster between 2 Xen guests where guests are ondifferent > hosts (Jeff Sturm) > 9. Cluster and LVG/LV (Chris Edwards) > 10. RE: Two nodes cluster issue without sharedstorageissue > (Jeff Sturm) > 11. RE: Cluster and LVG/LV (Jeff Sturm) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 23 Oct 2008 12:01:57 -0400 > From: Lon Hohberger > Subject: Re: [Linux-cluster] Two nodes cluster issue without shared > storage issue > To: linux clustering > Message-ID: <1224777717.32460.92.camel at ayanami> > Content-Type: text/plain > > On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) > wrote: > > Hi, > > > > I want to set up a two node cluster, I use active/standby mode to run > > my service. I need even one node's hardware failure such as power cut, > > another node still can handover from failure node and the provide the > > service. > > > > In my environment, I have no shared storage, so I can not use quorum > > disk. Is there any other way to implement it? I searched and found > > 'tiebreaker IP' may feed my request, but I can not found any hints on > > how to configure it ? > > Since you have no shared data, you may be able to run without fencing. > > That should be pretty straightforward, but you might need to comment out > the "fenced" startup from the cman init script. > > In this case, the worst that will happen is both nodes will end up > running the service at the same time in the event of a network > partition. > > The other down side is that if the cluster divides into two partitions > and later merges back into one partition, I don't think certain things > will work right; you will need to detect this event and reboot one of > the nodes. > > -- Lon > > > > ------------------------------ > > Message: 2 > Date: Thu, 23 Oct 2008 14:42:40 -0200 > From: "Flavio Junior" > Subject: Re: [Linux-cluster] Two nodes cluster issue without shared > storage issue > To: "linux clustering" > Message-ID: > <58aa8d780810230942s421d74dfqaf61190be764b57 at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Well.. If you are using an active/standby scenario, without a shared > storage, probably you can make use of CARP/UCARP > > http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol > http://www.ucarp.org/project/ucarp > > > -- > > Fl?vio do Carmo J?nior aka waKKu > > > On Thu, Oct 23, 2008 at 2:01 PM, Lon Hohberger wrote: > > > On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) > > wrote: > > > Hi, > > > > > > I want to set up a two node cluster, I use active/standby mode to run > > > my service. I need even one node's hardware failure such as power cut, > > > another node still can handover from failure node and the provide the > > > service. > > > > > > In my environment, I have no shared storage, so I can not use quorum > > > disk. Is there any other way to implement it? I searched and found > > > 'tiebreaker IP' may feed my request, but I can not found any hints on > > > how to configure it ? > > > > Since you have no shared data, you may be able to run without fencing. > > > > That should be pretty straightforward, but you might need to comment out > > the "fenced" startup from the cman init script. > > > > In this case, the worst that will happen is both nodes will end up > > running the service at the same time in the event of a network > > partition. > > > > The other down side is that if the cluster divides into two partitions > > and later merges back into one partition, I don't think certain things > > will work right; you will need to detect this event and reboot one of > > the nodes. > > > > -- Lon > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > https://www.redhat.com/archives/linux-cluster/attachments/20081023/0fdd088f/attachment.html > > ------------------------------ > > Message: 3 > Date: Fri, 24 Oct 2008 10:35:48 +0800 > From: "Chen, Mockey (NSN - CN/Cheng Du)" > Subject: RE: [Linux-cluster] Two nodes cluster issue without shared > storageissue > To: "linux clustering" > Message-ID: > <174CED94DD8DC54AB888B56E103B118742183A at CNBEEXC007.nsn-intra.net> > Content-Type: text/plain; charset="gb2312" > > > > >-----Original Message----- > >From: linux-cluster-bounces at redhat.com > >[mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Lon > >Hohberger > >Sent: 2008??10??24?? 0:02 > >To: linux clustering > >Subject: Re: [Linux-cluster] Two nodes cluster issue without > >shared storageissue > > > >On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) > >wrote: > >> Hi, > >> > >> I want to set up a two node cluster, I use active/standby > >mode to run > >> my service. I need even one node's hardware failure such as > >power cut, > >> another node still can handover from failure node and the > >provide the > >> service. > >> > >> In my environment, I have no shared storage, so I can not use quorum > >> disk. Is there any other way to implement it? I searched and found > >> 'tiebreaker IP' may feed my request, but I can not found any > >hints on > >> how to configure it ? > > > >Since you have no shared data, you may be able to run without fencing. > > > >That should be pretty straightforward, but you might need to > >comment out the "fenced" startup from the cman init script. > > > >In this case, the worst that will happen is both nodes will > >end up running the service at the same time in the event of a > >network partition. > > > >The other down side is that if the cluster divides into two > >partitions and later merges back into one partition, I don't > >think certain things will work right; you will need to detect > >this event and reboot one of the nodes. > > > >-- Lon > > I know such defects in two node cluster. > Since our service is mission critical, I want to know how to avoid such > failure case ? > > Thanks. > > > > > > ------------------------------ > > Message: 4 > Date: Fri, 24 Oct 2008 10:41:11 +0800 > From: "Chen, Mockey (NSN - CN/Cheng Du)" > Subject: RE: [Linux-cluster] Two nodes cluster issue without shared > storageissue > To: "linux clustering" > Message-ID: > <174CED94DD8DC54AB888B56E103B118742183D at CNBEEXC007.nsn-intra.net> > Content-Type: text/plain; charset="gb2312" > > > > > From: linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] On Behalf Of ext Flavio Junior > > Sent: 2008??10??24?? 0:43 > > To: linux clustering > > Subject: Re: [Linux-cluster] Two nodes cluster issue without shared > storageissue > > > > > > Well.. If you are using an active/standby scenario, without a > shared storage, probably you can make use of CARP/UCARP > > > > http://en.wikipedia.org/wiki/Common_Address_Redundancy_Protocol > > http://www.ucarp.org/project/ucarp > > > > I think CARP will fullfill my current request, but we have choose RHCS as > our cluster suite. It is very difficult to change it. > Anyhow, Thanks for your suggestion. > > > > ------------------------------ > > Message: 5 > Date: Fri, 24 Oct 2008 12:41:33 +0530 > From: "Rajagopal Swaminathan" > Subject: Re: [Linux-cluster] Cluster/GFS issue. > To: "linux clustering" > Message-ID: > <8786b91c0810240011u71e91161ia374c591d5f3cadb at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Greetings, > > 2008/10/23 Allgood, John > > > > Here is the problem that we are having. We can't on an consistent basic > > get the GFS filesystem mounted. On > > > > > Just a hunch... Cant say if it will help... > > Have you tried putting the mount command in rc.local instead of /etc/fstab? > > Regards, > > Rajagopal > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > https://www.redhat.com/archives/linux-cluster/attachments/20081024/c28769a4/attachment.html > > ------------------------------ > > Message: 6 > Date: Fri, 24 Oct 2008 12:44:16 +0530 > From: "Rajagopal Swaminathan" > Subject: Re: [Linux-cluster] Cluster/GFS issue. > To: "linux clustering" > Message-ID: > <8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > > > > 2008/10/23 Allgood, John > > > > > >> Here is the problem that we are having. We can't on an consistent basic > >> get the GFS filesystem mounted. On > >> > > > > > > Just a hunch... Cant say if it will help... > > > > Have you tried putting the mount command in rc.local instead of > /etc/fstab? > > > start clvmd too in rc.local. of course before mounting and use the commands > in chain using && > > > > > Regards, > > > > Rajagopal > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > https://www.redhat.com/archives/linux-cluster/attachments/20081024/c01faa0e/attachment.html > > ------------------------------ > > Message: 7 > Date: Fri, 24 Oct 2008 17:29:05 +0530 > From: "Panigrahi, Santosh Kumar" > Subject: [Linux-cluster] cluster between 2 Xen guests where guests are > on different hosts > To: "linux clustering" > Message-ID: > < > D566E8CF3538B54D95B925CB69CB4D2A16BC0485 at inblr-exch1.eu.uis.unisys.com> > > Content-Type: text/plain; charset="us-ascii" > > Hello, > > I am using RHEL5.2+RHCS and configured a 2 node cluster in XEN virtual > environment for testing purpose only. These 2 cluster nodes are 2 > virtual guests (p6pv1, p7pv1) and each virtual guest is on different > hosts/ Dom-0s (p6 & p7). I have already gone through the older questions > on this forum with similar problems and also the wiki page > (http://sources.redhat.com/cluster/wiki/VMClusterCookbook ). > But still I have confused a bit regarding the Xen fencing in this > scenarios. > I don't want to do any live migration here and only to do a > failover/failback services between 2 cluster nodes. I want to know > whether I have to configure fencing only between the 2 guests (using > fence_xvm) or also between the 2 hosts (using fence_xvmd) as well, where > as my cluster nodes are 2 Xen guests. > > I am configuring the cluster using luci and there options are as > follows. > > Fence Daemon Properties: > Post Fail Delay - 0 > Post Join Delay - 3 > Run XVM fence daemon - tick mark selected > > XVM fence daemon key distribution: > Enter a node hostname from the host cluster - ? > Enter a node hostname from the hosted (virtual) cluster _ ? > > Can someone please help me in this regard? > > Regards, > Santosh > > > > ------------------------------ > > Message: 8 > Date: Fri, 24 Oct 2008 10:09:57 -0400 > From: "Jeff Sturm" > Subject: RE: [Linux-cluster] cluster between 2 Xen guests where guests > are ondifferent hosts > To: "linux clustering" > Message-ID: > <64D0546C5EBBD147B75DE133D798665F0180693B at hugo.eprize.local> > Content-Type: text/plain; charset="us-ascii" > > Santosh, > > The hosts are responsible for fencing the guests, so, as far as I know > it is not possible to use fence_xvm without also configuring fence_xvmd. > > In our configuration we run an "inner" cluster amongst the DomU guests, > and an "outer" cluster amongst the Dom0 hosts. The outer cluster starts > fence_xvmd whenever cman starts. The fence_xvmd daemon listens for > multicast traffic from fence_xvm. We have a dedicated VLAN for this > traffic in our configuration. (Make sure your routing tables are > adjusted for this, if needed--whereas aisexec figures out what > interfaces to use for multicast automatically based on the bind address, > fence_xvm does not.) > > If your Dom0 hosts are not part of a cluster, it may be possible to run > fence_xvmd standalone. We have not attempted to do so, so I can't say > whether it can work. > > Jeff > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > > Panigrahi, Santosh Kumar > > Sent: Friday, October 24, 2008 7:59 AM > > To: linux clustering > > Subject: [Linux-cluster] cluster between 2 Xen guests where > > guests are ondifferent hosts > > > > Hello, > > > > I am using RHEL5.2+RHCS and configured a 2 node cluster in > > XEN virtual environment for testing purpose only. These 2 > > cluster nodes are 2 virtual guests (p6pv1, p7pv1) and each > > virtual guest is on different hosts/ Dom-0s (p6 & p7). I have > > already gone through the older questions on this forum with > > similar problems and also the wiki page > > (http://sources.redhat.com/cluster/wiki/VMClusterCookbook ). > > But still I have confused a bit regarding the Xen fencing in > > this scenarios. > > I don't want to do any live migration here and only to do a > > failover/failback services between 2 cluster nodes. I want to > > know whether I have to configure fencing only between the 2 > > guests (using > > fence_xvm) or also between the 2 hosts (using fence_xvmd) as > > well, where as my cluster nodes are 2 Xen guests. > > > > I am configuring the cluster using luci and there options are > > as follows. > > > > Fence Daemon Properties: > > Post Fail Delay - 0 > > Post Join Delay - 3 > > Run XVM fence daemon - tick mark selected > > > > XVM fence daemon key distribution: > > Enter a node hostname from the host cluster - ? > > Enter a node hostname from the hosted (virtual) cluster _ ? > > > > Can someone please help me in this regard? > > > > Regards, > > Santosh > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > ------------------------------ > > Message: 9 > Date: Fri, 24 Oct 2008 10:13:07 -0400 > From: Chris Edwards > Subject: [Linux-cluster] Cluster and LVG/LV > To: linux clustering > Message-ID: > < > 61252CC53A97634BA52256DCF2344FBC66C68DE2FF at OFFICEEXCHANGE.office.smartechcorp.net > > > > Content-Type: text/plain; charset="us-ascii" > > If I am installing multiple Xen VM's in a cluster with shared iSCSI space > with Logical Volumes for each virtual machine should I put each LV in its > own logical volume group or should I use one logical volume group for all of > the LV's? > > Thanks! > > --- > > Chris Edwards > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > https://www.redhat.com/archives/linux-cluster/attachments/20081024/7e71e22c/attachment.html > > ------------------------------ > > Message: 10 > Date: Fri, 24 Oct 2008 10:18:08 -0400 > From: "Jeff Sturm" > Subject: RE: [Linux-cluster] Two nodes cluster issue without > sharedstorageissue > To: "linux clustering" > Message-ID: > <64D0546C5EBBD147B75DE133D798665F0180693C at hugo.eprize.local> > Content-Type: text/plain; charset="iso-2022-jp" > > For what it's worth, considerations like these have caused us to abandon > any efforts to build a 2-node cluster. > > >From this point forward all our RHCS deployments will have a minimum of 3 > nodes, even if the 3rd node is a small node that provides no resources and > only exists for arbitration purposes. (It was going to be that, or a quorum > disk for our application, but we have no experience running a quorum disk > over the long-haul in a production envrironment.) > > Hope this helps someone. > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chen, > > Mockey (NSN - CN/Cheng Du) > > Sent: Thursday, October 23, 2008 10:36 PM > > To: linux clustering > > Subject: RE: [Linux-cluster] Two nodes cluster issue without > > sharedstorageissue > > > > > > > > >-----Original Message----- > > >From: linux-cluster-bounces at redhat.com > > >[mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Lon > > >Hohberger > > >Sent: 2008 $BG/ (J10 $B7n (J24 $BF| (J 0:02 > > >To: linux clustering > > >Subject: Re: [Linux-cluster] Two nodes cluster issue without shared > > >storageissue > > > > > >On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) > > >wrote: > > >> Hi, > > >> > > >> I want to set up a two node cluster, I use active/standby > > >mode to run > > >> my service. I need even one node's hardware failure such as > > >power cut, > > >> another node still can handover from failure node and the > > >provide the > > >> service. > > >> > > >> In my environment, I have no shared storage, so I can not > > use quorum > > >> disk. Is there any other way to implement it? I searched and found > > >> 'tiebreaker IP' may feed my request, but I can not found any > > >hints on > > >> how to configure it ? > > > > > >Since you have no shared data, you may be able to run > > without fencing. > > > > > >That should be pretty straightforward, but you might need to comment > > >out the "fenced" startup from the cman init script. > > > > > >In this case, the worst that will happen is both nodes will end up > > >running the service at the same time in the event of a network > > >partition. > > > > > >The other down side is that if the cluster divides into two > > partitions > > >and later merges back into one partition, I don't think > > certain things > > >will work right; you will need to detect this event and > > reboot one of > > >the nodes. > > > > > >-- Lon > > > > I know such defects in two node cluster. > > Since our service is mission critical, I want to know how to > > avoid such failure case ? > > > > Thanks. > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > ------------------------------ > > Message: 11 > Date: Fri, 24 Oct 2008 10:20:04 -0400 > From: "Jeff Sturm" > Subject: RE: [Linux-cluster] Cluster and LVG/LV > To: "linux clustering" > Message-ID: > <64D0546C5EBBD147B75DE133D798665F0180693D at hugo.eprize.local> > Content-Type: text/plain; charset="us-ascii" > > Chris, > > Are you running a clustered LVM, and do you expect to be able to use Xen > migration? > > Jeff > > > ________________________________ > > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards > Sent: Friday, October 24, 2008 10:13 AM > To: linux clustering > Subject: [Linux-cluster] Cluster and LVG/LV > > > > If I am installing multiple Xen VM's in a cluster with shared > iSCSI space with Logical Volumes for each virtual machine should I put > each LV in its own logical volume group or should I use one logical > volume group for all of the LV's? > > > > Thanks! > > > > --- > > > > Chris Edwards > > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > https://www.redhat.com/archives/linux-cluster/attachments/20081024/5650b4ab/attachment.html > > ------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 54, Issue 31 > ********************************************* > -- Regards, Varun Galande +971505589029 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff.sturm at eprize.com Fri Oct 24 14:59:44 2008 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Fri, 24 Oct 2008 10:59:44 -0400 Subject: [Linux-cluster] Cluster and LVG/LV In-Reply-To: <61252CC53A97634BA52256DCF2344FBC66C68DE303@OFFICEEXCHANGE.office.smartechcorp.net> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com><8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com><8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c@mail.gmail.com><61252CC53A97634BA52256DCF2344FBC66C68DE2FF@OFFICEEXCHANGE.office.smartechcorp.net><64D0546C5EBBD147B75DE133D798665F0180693D@hugo.eprize.local> <61252CC53A97634BA52256DCF2344FBC66C68DE303@OFFICEEXCHANGE.office.smartechcorp.net> Message-ID: <64D0546C5EBBD147B75DE133D798665F01806944@hugo.eprize.local> Okay. For CLVM it probably makes the most sense to run one big volume group across your cluster, but there's also the option of running a non-clustered LVM on each Dom0 host. The latter would only work for you however if you don't require Xen migration. I see 3 options for central storage in a Xen cluster, each with their own drawbacks: 1) Run a single clustered volume group across all hosts, containing one or more PV's from your shared storage. 2) Run a non-clustered volume group on each host, each with a distinct PV carved out of your shared storage. 3) Export storage for each host individually from your SAN, i.e. rely completely on your SAN for volume management. With this you don't need LVM at all. Both 1) and 3) allow you to use Xen migration. 2) is feasible if you don't need to migrate guests online. Our problem with 1) is snapshot support, and that we could not get pvmove to work acceptably well. (We had to make the entire volume group inactive before pvmove would even run--I'm not sure if it is expected, or what we did wrong.) We've tried and failed at 1), and will now be attempting 3). This gives us a lot of flexibility on a storage appliance that supports snapshots. I'd still like to have pvmove work so we could migrate online from one SAN to another, if needed, but I haven't been able to get it to work acceptably well. Also I thought I had read that snapshots are not supported by a clustered LVM? That would be difficult for us too, as we are relying on snapshots for a backup mechanism. Jeff ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards Sent: Friday, October 24, 2008 10:29 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Yes to both. Right now the cluster is running GFS and I can migrate VM's between the nodes. This question is coming up because I have been trying to do a snap shot and I realized the snapshot is stored on the Volume Group that the LV is located on. I did not realize this and I cannot do a snapshot because I did not leave enough space in each of the Volume Groups for each of the VM's. --- Chris Edwards Smartech Corp. Div. of AirNet Group http://www.airnetgroup.com http://www.smartechcorp.net cedwards at smartechcorp.net P: 423-664-7678 x114 C: 423-593-6964 F: 423-664-7680 From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Sturm Sent: Friday, October 24, 2008 10:20 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Chris, Are you running a clustered LVM, and do you expect to be able to use Xen migration? Jeff ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards Sent: Friday, October 24, 2008 10:13 AM To: linux clustering Subject: [Linux-cluster] Cluster and LVG/LV If I am installing multiple Xen VM's in a cluster with shared iSCSI space with Logical Volumes for each virtual machine should I put each LV in its own logical volume group or should I use one logical volume group for all of the LV's? Thanks! --- Chris Edwards -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Fri Oct 24 15:37:30 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 24 Oct 2008 11:37:30 -0400 Subject: [Linux-cluster] cluster between 2 Xen guests where guests are ondifferent hosts In-Reply-To: <64D0546C5EBBD147B75DE133D798665F0180693B@hugo.eprize.local> References: <45824.79.10.137.147.1197661952.squirrel@picard.linux.it> <1197672129.18614.2.camel@localhost.localdomain> <26275.62.101.100.5.1197887615.squirrel@picard.linux.it> <1197915660.4959.24.camel@ayanami.boston.devel.redhat.com> <476A18A2.2080406@wasko.pl> <64D0546C5EBBD147B75DE133D798665F0180693B@hugo.eprize.local> Message-ID: <1224862650.32460.126.camel@ayanami> On Fri, 2008-10-24 at 10:09 -0400, Jeff Sturm wrote: > Santosh, > > The hosts are responsible for fencing the guests, so, as far as I know > it is not possible to use fence_xvm without also configuring fence_xvmd. Correct. > In our configuration we run an "inner" cluster amongst the DomU guests, > and an "outer" cluster amongst the Dom0 hosts. The outer cluster starts > fence_xvmd whenever cman starts. The fence_xvmd daemon listens for > multicast traffic from fence_xvm. We have a dedicated VLAN for this > traffic in our configuration. (Make sure your routing tables are > adjusted for this, if needed--whereas aisexec figures out what > interfaces to use for multicast automatically based on the bind address, > fence_xvm does not.) > If your Dom0 hosts are not part of a cluster, it may be possible to run > fence_xvmd standalone. We have not attempted to do so, so I can't say > whether it can work. fence_xvmd -LX (need to add to rc.local or something) You could (in theory) do fencing using multiple fence_xvm agent instances to try different keys (one per physical host) so that if fencing a host on one key succeeds, you also ensure the other guest isn't running the node. For example, if you had two keys on the guests, you could do the following: * dd if=/dev/urandom of=/etc/cluster/fence_xvm-host1.key bs=4k count=1 * dd if=/dev/urandom of=/etc/cluster/fence_xvm-host2.key bs=4k count=1 * scp /etc/cluster/fence_xvm-host1.key host1:/etc/cluster/fence_xvm.key * scp /etc/cluster/fence_xvm-host2.key host2:/etc/cluster/fence_xvm.key (don't forget to copy /etc/cluster/fence_xvm* to the other virtual guest too!) Set up two fencing devices: Set up the nodes to fence both: ... maybe that would work. The reason you need a cluster in dom0 typically is because we use Checkpointing to distribute the states of VMs cluster-wide. If there's no cluster, then you can't distribute the states. Now, key files are, well, key here - fence_xvmd assumes that the admin does the correct thing (not reusing key files on multiple clusters), so therefore it returns "ok" if it's not got information about a guest... Suppose virt1 (on guest1) fails: * virt2 sends a request that only host2 listens to to try to fence virt1. - "Never heard of that domain, so it must be safe" * virt2 sends a request that only host1 listens to to try to fence virt1. - "Ok, it's running locally -> kill it and return success" -- Lon From rodrique.heron at baruch.cuny.edu Fri Oct 24 15:54:48 2008 From: rodrique.heron at baruch.cuny.edu (Rodrique Heron) Date: Fri, 24 Oct 2008 11:54:48 -0400 Subject: [Linux-cluster] Cluster and LVG/LV In-Reply-To: <64D0546C5EBBD147B75DE133D798665F01806944@hugo.eprize.local> Message-ID: <856BC630A1FD6540B94C17D96A17843F157BDA@mb01.baruch.local> Jeff- Thanks for your thoughts, until now I never really considered exporting storage from the SAN to my domU's. I can definitely see the advantage here, using the SAN snapshot utilities, it most cases it can be automated. I am interested in how you would accomplish similar functionality to the SAN snapshot, using LVM snapshots (let's say lvm snapshot support worked well). ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Sturm Sent: Friday, October 24, 2008 11:00 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Okay. For CLVM it probably makes the most sense to run one big volume group across your cluster, but there's also the option of running a non-clustered LVM on each Dom0 host. The latter would only work for you however if you don't require Xen migration. I see 3 options for central storage in a Xen cluster, each with their own drawbacks: 1) Run a single clustered volume group across all hosts, containing one or more PV's from your shared storage. 2) Run a non-clustered volume group on each host, each with a distinct PV carved out of your shared storage. 3) Export storage for each host individually from your SAN, i.e. rely completely on your SAN for volume management. With this you don't need LVM at all. Both 1) and 3) allow you to use Xen migration. 2) is feasible if you don't need to migrate guests online. Our problem with 1) is snapshot support, and that we could not get pvmove to work acceptably well. (We had to make the entire volume group inactive before pvmove would even run--I'm not sure if it is expected, or what we did wrong.) We've tried and failed at 1), and will now be attempting 3). This gives us a lot of flexibility on a storage appliance that supports snapshots. I'd still like to have pvmove work so we could migrate online from one SAN to another, if needed, but I haven't been able to get it to work acceptably well. Also I thought I had read that snapshots are not supported by a clustered LVM? That would be difficult for us too, as we are relying on snapshots for a backup mechanism. Jeff ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards Sent: Friday, October 24, 2008 10:29 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Yes to both. Right now the cluster is running GFS and I can migrate VM's between the nodes. This question is coming up because I have been trying to do a snap shot and I realized the snapshot is stored on the Volume Group that the LV is located on. I did not realize this and I cannot do a snapshot because I did not leave enough space in each of the Volume Groups for each of the VM's. --- Chris Edwards Smartech Corp. Div. of AirNet Group http://www.airnetgroup.com http://www.smartechcorp.net cedwards at smartechcorp.net P: 423-664-7678 x114 C: 423-593-6964 F: 423-664-7680 From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Sturm Sent: Friday, October 24, 2008 10:20 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Chris, Are you running a clustered LVM, and do you expect to be able to use Xen migration? Jeff ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards Sent: Friday, October 24, 2008 10:13 AM To: linux clustering Subject: [Linux-cluster] Cluster and LVG/LV If I am installing multiple Xen VM's in a cluster with shared iSCSI space with Logical Volumes for each virtual machine should I put each LV in its own logical volume group or should I use one logical volume group for all of the LV's? Thanks! --- Chris Edwards -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodrique.heron at baruch.cuny.edu Fri Oct 24 16:00:01 2008 From: rodrique.heron at baruch.cuny.edu (Rodrique Heron) Date: Fri, 24 Oct 2008 12:00:01 -0400 Subject: [Linux-cluster] Two nodes cluster issue without sharedstorageissue In-Reply-To: <64D0546C5EBBD147B75DE133D798665F0180693C@hugo.eprize.local> References: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net><1224777717.32460.92.camel@ayanami><174CED94DD8DC54AB888B56E103B118742183A@CNBEEXC007.nsn-intra.net> <64D0546C5EBBD147B75DE133D798665F0180693C@hugo.eprize.local> Message-ID: <20081024155620.BAE1B15EC7D@smtp25.baruch.cuny.edu> Jeff I have two node cluster only because my storage array only supports two nodes, can I add a third node without it having access to the storage? I am using CLVM to run domU's. Jeff Sturm wrote: > > For what it's worth, considerations like these have caused us to > abandon any efforts to build a 2-node cluster. > > >From this point forward all our RHCS deployments will have a minimum > of 3 nodes, even if the 3rd node is a small node that provides no > resources and only exists for arbitration purposes. (It was going to > be that, or a quorum disk for our application, but we have no > experience running a quorum disk over the long-haul in a production > envrironment.) > > Hope this helps someone. > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chen, > > Mockey (NSN - CN/Cheng Du) > > Sent: Thursday, October 23, 2008 10:36 PM > > To: linux clustering > > Subject: RE: [Linux-cluster] Two nodes cluster issue without > > sharedstorageissue > > > > > > > > >-----Original Message----- > > >From: linux-cluster-bounces at redhat.com > > >[mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Lon > > >Hohberger > > >Sent: 2008?10?24? 0:02 > > >To: linux clustering > > >Subject: Re: [Linux-cluster] Two nodes cluster issue without shared > > >storageissue > > > > > >On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) > > >wrote: > > >> Hi, > > >> > > >> I want to set up a two node cluster, I use active/standby > > >mode to run > > >> my service. I need even one node's hardware failure such as > > >power cut, > > >> another node still can handover from failure node and the > > >provide the > > >> service. > > >> > > >> In my environment, I have no shared storage, so I can not > > use quorum > > >> disk. Is there any other way to implement it? I searched and found > > >> 'tiebreaker IP' may feed my request, but I can not found any > > >hints on > > >> how to configure it ? > > > > > >Since you have no shared data, you may be able to run > > without fencing. > > > > > >That should be pretty straightforward, but you might need to comment > > >out the "fenced" startup from the cman init script. > > > > > >In this case, the worst that will happen is both nodes will end up > > >running the service at the same time in the event of a network > > >partition. > > > > > >The other down side is that if the cluster divides into two > > partitions > > >and later merges back into one partition, I don't think > > certain things > > >will work right; you will need to detect this event and > > reboot one of > > >the nodes. > > > > > >-- Lon > > > > I know such defects in two node cluster. > > Since our service is mission critical, I want to know how to > > avoid such failure case ? > > > > Thanks. > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Rodrique Heron -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rodrique_heron.vcf Type: text/x-vcard Size: 342 bytes Desc: not available URL: From jeff.sturm at eprize.com Fri Oct 24 16:29:29 2008 From: jeff.sturm at eprize.com (Jeff Sturm) Date: Fri, 24 Oct 2008 12:29:29 -0400 Subject: [Linux-cluster] Two nodes cluster issue without sharedstorageissue In-Reply-To: <20081024155620.BAE1B15EC7D@smtp25.baruch.cuny.edu> References: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net><1224777717.32460.92.camel@ayanami><174CED94DD8DC54AB888B56E103B118742183A@CNBEEXC007.nsn-intra.net><64D0546C5EBBD147B75DE133D798665F0180693C@hugo.eprize.local> <20081024155620.BAE1B15EC7D@smtp25.baruch.cuny.edu> Message-ID: <64D0546C5EBBD147B75DE133D798665F0180694D@hugo.eprize.local> Certainly. That third node need not run any cluster services at all other than fencing, and yet would guarantee a quorum in the even of loss of any single node. A quorum disk would theoretically solve this as well, but for reasons I can't quite articulate I suspect the three-node cluster is superior. (Besides, we have stockpiles of cheap hardware where I'm at, so there's little reason for us not to do it.) ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Rodrique Heron Sent: Friday, October 24, 2008 12:00 PM To: linux clustering Subject: Re: [Linux-cluster] Two nodes cluster issue without sharedstorageissue Jeff I have two node cluster only because my storage array only supports two nodes, can I add a third node without it having access to the storage? I am using CLVM to run domU's. Jeff Sturm wrote: For what it's worth, considerations like these have caused us to abandon any efforts to build a 2-node cluster. >From this point forward all our RHCS deployments will have a minimum of 3 nodes, even if the 3rd node is a small node that provides no resources and only exists for arbitration purposes. (It was going to be that, or a quorum disk for our application, but we have no experience running a quorum disk over the long-haul in a production envrironment.) Hope this helps someone. > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chen, > Mockey (NSN - CN/Cheng Du) > Sent: Thursday, October 23, 2008 10:36 PM > To: linux clustering > Subject: RE: [Linux-cluster] Two nodes cluster issue without > sharedstorageissue > > > > >-----Original Message----- > >From: linux-cluster-bounces at redhat.com > >[mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Lon > >Hohberger > >Sent: 2008?10?24? 0:02 > >To: linux clustering > >Subject: Re: [Linux-cluster] Two nodes cluster issue without shared > >storageissue > > > >On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - CN/Cheng Du) > >wrote: > >> Hi, > >> > >> I want to set up a two node cluster, I use active/standby > >mode to run > >> my service. I need even one node's hardware failure such as > >power cut, > >> another node still can handover from failure node and the > >provide the > >> service. > >> > >> In my environment, I have no shared storage, so I can not > use quorum > >> disk. Is there any other way to implement it? I searched and found > >> 'tiebreaker IP' may feed my request, but I can not found any > >hints on > >> how to configure it ? > > > >Since you have no shared data, you may be able to run > without fencing. > > > >That should be pretty straightforward, but you might need to comment > >out the "fenced" startup from the cman init script. > > > >In this case, the worst that will happen is both nodes will end up > >running the service at the same time in the event of a network > >partition. > > > >The other down side is that if the cluster divides into two > partitions > >and later merges back into one partition, I don't think > certain things > >will work right; you will need to detect this event and > reboot one of > >the nodes. > > > >-- Lon > > I know such defects in two node cluster. > Since our service is mission critical, I want to know how to > avoid such failure case ? > > Thanks. > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Rodrique Heron -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodrique.heron at baruch.cuny.edu Fri Oct 24 16:52:00 2008 From: rodrique.heron at baruch.cuny.edu (Rodrique Heron) Date: Fri, 24 Oct 2008 12:52:00 -0400 Subject: [Linux-cluster] Two nodes cluster issue without sharedstorageissue In-Reply-To: <64D0546C5EBBD147B75DE133D798665F0180694D@hugo.eprize.local> References: <174CED94DD8DC54AB888B56E103B11873C2CE1@CNBEEXC007.nsn-intra.net><1224777717.32460.92.camel@ayanami><174CED94DD8DC54AB888B56E103B118742183A@CNBEEXC007.nsn-intra.net><64D0546C5EBBD147B75DE133D798665F0180693C@hugo.eprize.local> <20081024155620.BAE1B15EC7D@smtp25.baruch.cuny.edu> <64D0546C5EBBD147B75DE133D798665F0180694D@hugo.eprize.local> Message-ID: <20081024164819.2A9B315EC49@smtp25.baruch.cuny.edu> Thanks Jeff, I share the same reasons. Jeff Sturm wrote: > Certainly. That third node need not run any clusterservices atall > other than fencing, and yet would guarantee a quorum in the even of > loss of any single node. > A quorum disk would theoretically solve this as well, but for reasons > I can't quite articulate I suspect the three-node cluster is superior. > (Besides, we have stockpiles of cheap hardware where I'm at, so > there's little reason for usnot to do it.) > > ------------------------------------------------------------------------ > *From:* linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Rodrique > Heron > *Sent:* Friday, October 24, 2008 12:00 PM > *To:* linux clustering > *Subject:* Re: [Linux-cluster] Two nodes cluster issue without > sharedstorageissue > > Jeff > > I have two node cluster only because my storage array only > supports two nodes, can I add a third node without it having > access to the storage? I am using CLVM to run domU's. > > > > Jeff Sturm wrote: >> >> For what it's worth, considerations like these have caused us to >> abandon any efforts to build a 2-node cluster. >> >> >From this point forward all our RHCS deployments will have a >> minimum of 3 nodes, even if the 3rd node is a small node that >> provides no resources and only exists for arbitration purposes. >> (It was going to be that, or a quorum disk for our application, >> but we have no experience running a quorum disk over the >> long-haul in a production envrironment.) >> >> Hope this helps someone. >> >> > -----Original Message----- >> > From: linux-cluster-bounces at redhat.com >> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chen, >> > Mockey (NSN - CN/Cheng Du) >> > Sent: Thursday, October 23, 2008 10:36 PM >> > To: linux clustering >> > Subject: RE: [Linux-cluster] Two nodes cluster issue without >> > sharedstorageissue >> > >> > >> > >> > >-----Original Message----- >> > >From: linux-cluster-bounces at redhat.com >> > >[mailto:linux-cluster-bounces at redhat.com] On Behalf Of ext Lon >> > >Hohberger >> > >Sent: 2008?10?24? 0:02 >> > >To: linux clustering >> > >Subject: Re: [Linux-cluster] Two nodes cluster issue without >> shared >> > >storageissue >> > > >> > >On Thu, 2008-10-16 at 17:10 +0800, Chen, Mockey (NSN - >> CN/Cheng Du) >> > >wrote: >> > >> Hi, >> > >> >> > >> I want to set up a two node cluster, I use active/standby >> > >mode to run >> > >> my service. I need even one node's hardware failure such as >> > >power cut, >> > >> another node still can handover from failure node and the >> > >provide the >> > >> service. >> > >> >> > >> In my environment, I have no shared storage, so I can not >> > use quorum >> > >> disk. Is there any other way to implement it? I searched and >> found >> > >> 'tiebreaker IP' may feed my request, but I can not found any >> > >hints on >> > >> how to configure it ? >> > > >> > >Since you have no shared data, you may be able to run >> > without fencing. >> > > >> > >That should be pretty straightforward, but you might need to >> comment >> > >out the "fenced" startup from the cman init script. >> > > >> > >In this case, the worst that will happen is both nodes will end up >> > >running the service at the same time in the event of a network >> > >partition. >> > > >> > >The other down side is that if the cluster divides into two >> > partitions >> > >and later merges back into one partition, I don't think >> > certain things >> > >will work right; you will need to detect this event and >> > reboot one of >> > >the nodes. >> > > >> > >-- Lon >> > >> > I know such defects in two node cluster. >> > Since our service is mission critical, I want to know how to >> > avoid such failure case ? >> > >> > Thanks. >> > >> > >> > >> > -- >> > Linux-cluster mailing list >> > Linux-cluster at redhat.com >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> > >> > >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Rodrique Heron > > > -- Rodrique Heron Systems Administrator/ Red Hat Certified Engineer Baruch College 1 Bernard Baruch Way, Box H-0910 New York, NY 10010 Phone: (646) 312-1055 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rodrique_heron.vcf Type: text/x-vcard Size: 342 bytes Desc: not available URL: From cedwards at smartechcorp.net Fri Oct 24 17:26:10 2008 From: cedwards at smartechcorp.net (Chris Edwards) Date: Fri, 24 Oct 2008 13:26:10 -0400 Subject: [Linux-cluster] Cluster and LVG/LV In-Reply-To: <856BC630A1FD6540B94C17D96A17843F157BDA@mb01.baruch.local> References: <64D0546C5EBBD147B75DE133D798665F01806944@hugo.eprize.local> <856BC630A1FD6540B94C17D96A17843F157BDA@mb01.baruch.local> Message-ID: <61252CC53A97634BA52256DCF2344FBC66C68DE314@OFFICEEXCHANGE.office.smartechcorp.net> Thanks for the advice! So could I use vgmerge to merge all of my volume groups into one large volume group then? --- Chris Edwards From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Rodrique Heron Sent: Friday, October 24, 2008 11:55 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Jeff- Thanks for your thoughts, until now I never really considered exporting storage from the SAN to my domU's. I can definitely see the advantage here, using the SAN snapshot utilities, it most cases it can be automated. I am interested in how you would accomplish similar functionality to the SAN snapshot, using LVM snapshots (let's say lvm snapshot support worked well). ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Sturm Sent: Friday, October 24, 2008 11:00 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Okay. For CLVM it probably makes the most sense to run one big volume group across your cluster, but there's also the option of running a non-clustered LVM on each Dom0 host. The latter would only work for you however if you don't require Xen migration. I see 3 options for central storage in a Xen cluster, each with their own drawbacks: 1) Run a single clustered volume group across all hosts, containing one or more PV's from your shared storage. 2) Run a non-clustered volume group on each host, each with a distinct PV carved out of your shared storage. 3) Export storage for each host individually from your SAN, i.e. rely completely on your SAN for volume management. With this you don't need LVM at all. Both 1) and 3) allow you to use Xen migration. 2) is feasible if you don't need to migrate guests online. Our problem with 1) is snapshot support, and that we could not get pvmove to work acceptably well. (We had to make the entire volume group inactive before pvmove would even run--I'm not sure if it is expected, or what we did wrong.) We've tried and failed at 1), and will now be attempting 3). This gives us a lot of flexibility on a storage appliance that supports snapshots. I'd still like to have pvmove work so we could migrate online from one SAN to another, if needed, but I haven't been able to get it to work acceptably well. Also I thought I had read that snapshots are not supported by a clustered LVM? That would be difficult for us too, as we are relying on snapshots for a backup mechanism. Jeff ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards Sent: Friday, October 24, 2008 10:29 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Yes to both. Right now the cluster is running GFS and I can migrate VM's between the nodes. This question is coming up because I have been trying to do a snap shot and I realized the snapshot is stored on the Volume Group that the LV is located on. I did not realize this and I cannot do a snapshot because I did not leave enough space in each of the Volume Groups for each of the VM's. --- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeff Sturm Sent: Friday, October 24, 2008 10:20 AM To: linux clustering Subject: RE: [Linux-cluster] Cluster and LVG/LV Chris, Are you running a clustered LVM, and do you expect to be able to use Xen migration? Jeff ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Chris Edwards Sent: Friday, October 24, 2008 10:13 AM To: linux clustering Subject: [Linux-cluster] Cluster and LVG/LV If I am installing multiple Xen VM's in a cluster with shared iSCSI space with Logical Volumes for each virtual machine should I put each LV in its own logical volume group or should I use one logical volume group for all of the LV's? Thanks! --- Chris Edwards -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Fri Oct 24 20:36:41 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 24 Oct 2008 16:36:41 -0400 Subject: [Linux-cluster] ipfails In-Reply-To: <1224777409.32460.87.camel@ayanami> References: <1224777409.32460.87.camel@ayanami> Message-ID: <1224880601.32460.138.camel@ayanami> On Thu, 2008-10-23 at 11:56 -0400, Lon Hohberger wrote: > On Wed, 2008-10-22 at 23:50 +0200, max liccardo wrote: > > hi cluster masters, > > I'm using linux-HA and linux-cluster on separate project. > > I'm wondering if I can use with linux-cluster something like the > > linux-ha ping nodes, in order to have some sort of "network quorum". > > bye > > Currently, no, but you could build a daemon which did this and talked > to > the CMAN quorum API to do this. Actually, I have something partially prototyped to do "simple IP tiebreaker" sort of thing like this. It's based on what we had in clumanager a few years ago, and only works in limited cases (i.e. 2 node clusters). It kind of plugs in the same way as qdiskd but is far simpler (and, of course, doesn't require a disk). I could finish up pretty quickly if you cared to test it. -- Lon From greg.hellings at harcourt.com Fri Oct 24 22:41:15 2008 From: greg.hellings at harcourt.com (Greg Hellings) Date: Fri, 24 Oct 2008 15:41:15 -0700 Subject: [Linux-cluster] LVS-DR question Message-ID: Does anyone know if the VIP in a LVS-DR config has to be on the same subnet as the RIP? And If not, is there some reason that all the RIPs would need to be in the same subnet? -- Greg From wferi at niif.hu Sun Oct 26 10:36:52 2008 From: wferi at niif.hu (Ferenc Wagner) Date: Sun, 26 Oct 2008 11:36:52 +0100 Subject: [Linux-cluster] Cluster and LVG/LV In-Reply-To: <64D0546C5EBBD147B75DE133D798665F01806944@hugo.eprize.local> (Jeff Sturm's message of "Fri, 24 Oct 2008 10:59:44 -0400") References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> <8786b91c0810240011u71e91161ia374c591d5f3cadb@mail.gmail.com> <8786b91c0810240014g7ff6ed7cw5b8fd853b9ca256c@mail.gmail.com> <61252CC53A97634BA52256DCF2344FBC66C68DE2FF@OFFICEEXCHANGE.office.smartechcorp.net> <64D0546C5EBBD147B75DE133D798665F0180693D@hugo.eprize.local> <61252CC53A97634BA52256DCF2344FBC66C68DE303@OFFICEEXCHANGE.office.smartechcorp.net> <64D0546C5EBBD147B75DE133D798665F01806944@hugo.eprize.local> Message-ID: <87mygrr6bf.fsf@szonett.ki.iif.hu> "Jeff Sturm" writes: > 1) Run a single clustered volume group across all hosts, containing > one or more PV's from your shared storage. > > 3) Export storage for each host individually from your SAN, > i.e. rely completely on your SAN for volume management. With this > you don't need LVM at all. > > Our problem with 1) is snapshot support, and that we could not get > pvmove to work acceptably well. (We had to make the entire volume > group inactive before pvmove would even run--I'm not sure if it is > expected, or what we did wrong.) It helps if you do LVM in your domU's, too. Or only there, if you use 3). -- Feri. From linux-cluster at via-rs.net Mon Oct 27 02:22:41 2008 From: linux-cluster at via-rs.net (CR Lou) Date: Mon, 27 Oct 2008 00:22:41 -0200 Subject: [Linux-cluster] fence_ilo + HP ProLiant DL580 G5 Message-ID: <000301c937da$e86a0430$0200a8c0@beta> Hi cluster men, we are in the process of building a cluster to virtualization a lot of low-end servers using xen. Our plan is to use rhcs and clvm for this but iLO insists on not working... :-| The cluster has 2 nodes, two HP ProLiant DL580 G5 (x86_64). We're using multi-vlan access to reach a lot of networks and EMC symmetrix more multipath to share the disks. Well, everything is ok except when I need to use iLO to provide one secure way for ha. Follows my cluster.conf: node1# clustat Cluster Status for alpha @ Sun Oct 26 21:32:52 2008 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.ha 1 Online, Local, rgmanager node2.ha 2 Online, rgmanager /dev/mapper/3600604800002877515624d4630383434p1 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- vm:rh52-para-virt01 node1.ha started vm:w2003-vm01 node2.ha started Look, when I try to fence the another node it doesn't works. node1# fence_node node2.ha node1# echo $? 1 node1# tail -1 /var/log/messages Oct 26 21:44:44 xxxxx fence_node[1480]: Fence of "node2.ha" was unsuccessful But if I try to fence via agent it works fine. node1# ./fence_ilo -o off -l Administrator -p xxxx -a 10.127.255.130 success echo $? 0 # clustat Cluster Status for alpha @ Sun Oct 26 21:56:36 2008 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.ha 1 Online, Local, rgmanager node2.ha 2 Offline /dev/mapper/3600604800002877515624d4630383434p1 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- vm:rh52-para-virt01 node1.ha started vm:w2003-vm01 node2.ha started Now node2 is offline but the service remains there, that is, node1 doesn't take over the vm:w2003-vm01 from node2. Follow the messages.log. node1# tail -50 /var/log/messages Oct 26 21:44:44 xxxxx fence_node[1480]: Fence of "node2.ha" was unsuccessful Oct 26 21:54:49 xxxxx openais[31517]: [TOTEM] The token was lost in the OPERATIONAL state. Oct 26 21:54:49 xxxxx openais[31517]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Oct 26 21:54:49 xxxxx openais[31517]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Oct 26 21:54:49 xxxxx openais[31517]: [TOTEM] entering GATHER state from 2. Oct 26 21:54:50 xxxxx qdiskd[31565]: Writing eviction notice for node 2 Oct 26 21:54:51 xxxxx qdiskd[31565]: Node 2 evicted Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] entering GATHER state from 0. Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] Creating commit token because I am the rep. Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] Saving state aru 75 high seq received 75 Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] Storing new sequence id for ring 14ac Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] entering COMMIT state. Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] entering RECOVERY state. Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] position [0] member 10.127.255.137: Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] previous ring seq 5288 rep 10.127.255.137 Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] aru 75 high delivered 75 received flag 1 Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] Did not need to originate any messages in recovery. Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] Sending initial ORF token Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] CLM CONFIGURATION CHANGE Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] New Configuration: Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] r(0) ip(10.127.255.137) Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] Members Left: Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] r(0) ip(10.127.255.138) Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] Members Joined: Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] CLM CONFIGURATION CHANGE Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] New Configuration: Oct 26 21:54:54 xxxxx clurgmgrd[31715]: State change: node2.ha DOWN Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] r(0) ip(10.127.255.137) Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] Members Left: Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] Members Joined: Oct 26 21:54:54 xxxxx openais[31517]: [SYNC ] This node is within the primary component and will provide service. Oct 26 21:54:54 xxxxx openais[31517]: [TOTEM] entering OPERATIONAL state. Oct 26 21:54:54 xxxxx openais[31517]: [CLM ] got nodejoin message 10.127.255.137 Oct 26 21:54:54 xxxxx openais[31517]: [CPG ] got joinlist message from node 1 Oct 26 21:54:54 xxxxx kernel: dlm: closing connection to node 2 Oct 26 21:54:54 xxxxx fenced[31533]: node2.ha not a cluster member after 0 sec post_fail_delay Oct 26 21:54:54 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 21:54:54 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 21:54:59 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 21:54:59 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 21:55:04 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 21:55:04 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 21:55:09 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 21:55:09 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 21:55:14 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 21:55:14 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 21:55:19 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 21:55:19 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 21:55:24 xxxxx fenced[31533]: fencing node "node2.ha" Until I to force via fenced_override node1# echo node2.ha > /var/run/cluster/fenced_override tail -1 /var/log/messages Oct 26 22:05:08 xxxxx clurgmgrd[31715]: Taking over service vm:w2003-vm01 from down member node2.ha Another example, if I simply to put the iface of heartbeat to off on node2 (for simulate the problem), the same thing happens. node2# ifconfig eth1 down node1# tail -50 /var/log/messages Oct 26 23:39:07 xxxxx openais[31517]: [TOTEM] The token was lost in the OPERATIONAL state. Oct 26 23:39:07 xxxxx openais[31517]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Oct 26 23:39:07 xxxxx openais[31517]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Oct 26 23:39:07 xxxxx openais[31517]: [TOTEM] entering GATHER state from 2. Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] entering GATHER state from 0. Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] Creating commit token because I am the rep. Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] Saving state aru 52 high seq received 52 Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] Storing new sequence id for ring 14b4 Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] entering COMMIT state. Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] entering RECOVERY state. Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] position [0] member 10.127.255.137: Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] previous ring seq 5296 rep 10.127.255.137 Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] aru 52 high delivered 52 received flag 1 Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] Did not need to originate any messages in recovery. Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] Sending initial ORF token Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] CLM CONFIGURATION CHANGE Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] New Configuration: Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] r(0) ip(10.127.255.137) Oct 26 23:39:12 xxxxx clurgmgrd[31715]: State change: node2.ha DOWN Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] Members Left: Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] r(0) ip(10.127.255.138) Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] Members Joined: Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] CLM CONFIGURATION CHANGE Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] New Configuration: Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] r(0) ip(10.127.255.137) Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] Members Left: Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] Members Joined: Oct 26 23:39:12 xxxxx openais[31517]: [SYNC ] This node is within the primary component and will provide service. Oct 26 23:39:12 xxxxx openais[31517]: [TOTEM] entering OPERATIONAL state. Oct 26 23:39:12 xxxxx openais[31517]: [CLM ] got nodejoin message 10.127.255.137 Oct 26 23:39:12 xxxxx openais[31517]: [CPG ] got joinlist message from node 1 Oct 26 23:39:12 xxxxx kernel: dlm: closing connection to node 2 Oct 26 23:39:12 xxxxx fenced[31533]: node2.ha not a cluster member after 0 sec post_fail_delay Oct 26 23:39:12 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 23:39:12 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 23:39:17 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 23:39:17 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 23:39:22 xxxxx fenced[31533]: fencing node "node2.ha" Oct 26 23:39:22 xxxxx fenced[31533]: fence "node2.ha" failed Oct 26 23:39:27 xxxxx fenced[31533]: fencing node "node2.ha" node1# clustat Cluster Status for alpha @ Sun Oct 26 23:41:20 2008 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.ha 1 Online, Local, rgmanager node2.ha 2 Offline /dev/mapper/3600604800002877515624d4630383434p1 0 Online, Quorum Disk Service Name Owner (Last) State ------- ---- ----- ------ ----- vm:rh52-para-virt01 node1.ha started vm:w2003-vm01 node2.ha started I believe that node1 had power off node2 via iLO because node2 don't responded anymore but node1 didn't take over the service like it should to do. Finally for try to solve this problem I loaded these modules on both nodes from hp-OpenIPMI-8.1.0-104.rhel5.rpm package but nothing changed. /opt/hp/hp-OpenIPMI/bin/2.6.18-92.el5xen/ipmi_devintf.ko /opt/hp/hp-OpenIPMI/bin/2.6.18-92.el5xen/ipmi_msghandler.ko /opt/hp/hp-OpenIPMI/bin/2.6.18-92.el5xen/ipmi_poweroff.ko /opt/hp/hp-OpenIPMI/bin/2.6.18-92.el5xen/ipmi_si.ko /opt/hp/hp-OpenIPMI/bin/2.6.18-92.el5xen/ipmi_watchdog.ko ps. I'm using one rh5.2, kernel-2.6.18-92.el5, cman-2.0.84-2.el5, rgmanager-2.0.38-2.el5 and iLO 1.50 on HPs. tks a lot. -- Renan From oioi at cableplus.com.cn Mon Oct 27 02:39:54 2008 From: oioi at cableplus.com.cn (Lu Wen-yan) Date: Mon, 27 Oct 2008 10:39:54 +0800 Subject: [Linux-cluster] cman killed by node 2 for reason 2 Message-ID: <804362282.20081027103954@cableplus.com.cn> Hello linux-cluster, Oct 26 18:45:08 cms2 openais[13904]: [SYNC ] This node is within the primary component and will provide service. Oct 26 18:45:08 cms2 openais[13904]: [TOTEM] entering OPERATIONAL state. Oct 26 18:45:08 cms2 openais[13904]: [CLM ] got nodejoin message 192.168.201.1 Oct 26 18:45:08 cms2 openais[13904]: [CLM ] got nodejoin message 192.168.201.2 Oct 26 18:45:08 cms2 openais[13904]: [CPG ] got joinlist message from node 1 [b][color=Red]Oct 26 18:45:08 cms2 openais[13904]: [CMAN ] cman killed by node 2 for reason 2 I get an error msg when I restart cman. Anyone know what is " reason 2 " ? Thanks -- Best regards, Lu mailto:oioi at cableplus.com.cn From tom at netspot.com.au Mon Oct 27 06:20:21 2008 From: tom at netspot.com.au (Tom Lanyon) Date: Mon, 27 Oct 2008 16:50:21 +1030 Subject: [Linux-cluster] SELinux contexts not propagating between GFS nodes Message-ID: <88A3D32D-1F53-4CAC-950A-D3EBCAE47547@netspot.com.au> Hi list, I'm seeing an occasional issue where an SELinux file context is applied on a cluster node to a file on a GFS1 filesystem, but the old context remains on one (or more) other nodes. A simple 'restorecon /path/to/file' fixes the context on the "broken" node. We're running CentOS 5.2 x86_64 with all the latest stable cluster and GFS versions. Any ideas why this could be happening and/or how to debug it? Thanks, Tom -- Tom Lanyon Systems Administrator NetSpot Pty Ltd From ccaulfie at redhat.com Mon Oct 27 09:36:34 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Mon, 27 Oct 2008 09:36:34 +0000 Subject: [Linux-cluster] cman killed by node 2 for reason 2 In-Reply-To: <804362282.20081027103954@cableplus.com.cn> References: <804362282.20081027103954@cableplus.com.cn> Message-ID: <49058BA2.5000505@redhat.com> Lu Wen-yan wrote: > Hello linux-cluster, > > Oct 26 18:45:08 cms2 openais[13904]: [SYNC ] This node is within the primary component and will provide service. > Oct 26 18:45:08 cms2 openais[13904]: [TOTEM] entering OPERATIONAL state. > Oct 26 18:45:08 cms2 openais[13904]: [CLM ] got nodejoin message 192.168.201.1 > Oct 26 18:45:08 cms2 openais[13904]: [CLM ] got nodejoin message 192.168.201.2 > Oct 26 18:45:08 cms2 openais[13904]: [CPG ] got joinlist message from node 1 > [b][color=Red]Oct 26 18:45:08 cms2 openais[13904]: [CMAN ] cman killed by node 2 for reason 2 > > I get an error msg when I restart cman. Anyone know what is " reason 2 " ? > It means you have a very old version of cman that needs updating ;-) That message as from 5.0 and lots of things have been fixed (including that error) since then . Chrissie From lhh at redhat.com Mon Oct 27 15:00:56 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 27 Oct 2008 11:00:56 -0400 Subject: [Linux-cluster] ipfails In-Reply-To: <1224880601.32460.138.camel@ayanami> References: <1224777409.32460.87.camel@ayanami> <1224880601.32460.138.camel@ayanami> Message-ID: <1225119656.32460.139.camel@ayanami> On Fri, 2008-10-24 at 16:36 -0400, Lon Hohberger wrote: > On Thu, 2008-10-23 at 11:56 -0400, Lon Hohberger wrote: > > On Wed, 2008-10-22 at 23:50 +0200, max liccardo wrote: > > > hi cluster masters, > > > I'm using linux-HA and linux-cluster on separate project. > > > I'm wondering if I can use with linux-cluster something like the > > > linux-ha ping nodes, in order to have some sort of "network quorum". > > > bye > > > > Currently, no, but you could build a daemon which did this and talked > > to > > the CMAN quorum API to do this. > > Actually, I have something partially prototyped to do "simple IP > tiebreaker" sort of thing like this. It's based on what we had in > clumanager a few years ago, and only works in limited cases (i.e. 2 node > clusters). > > It kind of plugs in the same way as qdiskd but is far simpler (and, of > course, doesn't require a disk). I could finish up pretty quickly if > you cared to test it. Fun with the CMAN quorum API - an IPv4 tiebreaker a la RHCS3 / clumanager 1.2.x http://people.redhat.com/lhh/qnet.tar.gz [sha256sum] 769a35d8ec7b2ebdec9ba1439d6ff98a5d6b5dddf5f9c3ce7cb3d97fd4e7d1ad -- Lon From lhh at redhat.com Mon Oct 27 18:06:42 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 27 Oct 2008 14:06:42 -0400 Subject: [Linux-cluster] LVS-DR question In-Reply-To: References: Message-ID: <1225130802.32460.159.camel@ayanami> On Fri, 2008-10-24 at 15:41 -0700, Greg Hellings wrote: > Does anyone know if the VIP in a LVS-DR config has to be on the same subnet > as the RIP? If I understand the question.... Yes. All realservers and the director's VIP need to be on the same subnet. I usually put the VIP on the realservers' public NICs and use arptables_jf to prevent the VIPs from sending/receiving ARP requests for the VIP. One trick you can do lets you put the VIP on the realservers on lo:0, but I've never done it. Either way, the realservers' "real" IP needs to be on the same subnet as the VIP. -- Lon From greg.hellings at harcourt.com Mon Oct 27 20:33:33 2008 From: greg.hellings at harcourt.com (Greg Hellings) Date: Mon, 27 Oct 2008 13:33:33 -0700 Subject: [Linux-cluster] LVS-DR question In-Reply-To: <1225130802.32460.159.camel@ayanami> Message-ID: Thank you. That directly answers my question. BTW, I am doing the lo:0 trick with net.ipv4.conf.lo.arp_ignore = 1 net.ipv4.conf.lo.arp_announce = 2 net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 2 And it works great. -- Greg On 10/27/08 11:06 AM, "Lon Hohberger" wrote: > On Fri, 2008-10-24 at 15:41 -0700, Greg Hellings wrote: >> Does anyone know if the VIP in a LVS-DR config has to be on the same subnet >> as the RIP? > > If I understand the question.... > > Yes. All realservers and the director's VIP need to be on the same > subnet. > > I usually put the VIP on the realservers' public NICs and use > arptables_jf to prevent the VIPs from sending/receiving ARP requests for > the VIP. > > One trick you can do lets you put the VIP on the realservers on lo:0, > but I've never done it. Either way, the realservers' "real" IP needs to > be on the same subnet as the VIP. > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From oioi at cableplus.com.cn Tue Oct 28 05:46:19 2008 From: oioi at cableplus.com.cn (Lu Wen-yan) Date: Tue, 28 Oct 2008 13:46:19 +0800 Subject: [Linux-cluster] cman killed by node 2 for reason 2 In-Reply-To: <49058BA2.5000505@redhat.com> References: <804362282.20081027103954@cableplus.com.cn> <49058BA2.5000505@redhat.com> Message-ID: <55153690.20081028134619@cableplus.com.cn> Hello Christine, Can you tell me what is the problem? I have many servers in production. Is it safe to upgrade cluster? Thanks Monday, October 27, 2008, 5:36:34 PM, you wrote: CC> Lu Wen-yan wrote: >> Hello linux-cluster, >> >> Oct 26 18:45:08 cms2 openais[13904]: [SYNC ] This node is within the primary component and will provide service. >> Oct 26 18:45:08 cms2 openais[13904]: [TOTEM] entering OPERATIONAL state. >> Oct 26 18:45:08 cms2 openais[13904]: [CLM ] got nodejoin message 192.168.201.1 >> Oct 26 18:45:08 cms2 openais[13904]: [CLM ] got nodejoin message 192.168.201.2 >> Oct 26 18:45:08 cms2 openais[13904]: [CPG ] got joinlist message from node 1 >> [b][color=Red]Oct 26 18:45:08 cms2 openais[13904]: [CMAN ] cman killed by node 2 for reason 2 >> >> I get an error msg when I restart cman. Anyone know what is " reason 2 " ? >> CC> It means you have a very old version of cman that needs updating ;-) CC> That message as from 5.0 and lots of things have been fixed (including CC> that error) since then . CC> Chrissie -- Best regards, Lu mailto:oioi at cableplus.com.cn From ccaulfie at redhat.com Tue Oct 28 08:39:25 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 28 Oct 2008 08:39:25 +0000 Subject: [Linux-cluster] cman killed by node 2 for reason 2 In-Reply-To: <55153690.20081028134619@cableplus.com.cn> References: <804362282.20081027103954@cableplus.com.cn> <49058BA2.5000505@redhat.com> <55153690.20081028134619@cableplus.com.cn> Message-ID: <4906CFBD.2030603@redhat.com> Lu Wen-yan wrote: > Hello Christine, > > Can you tell me what is the problem? > I have many servers in production. Is it safe to upgrade cluster? > > Thanks > > > Monday, October 27, 2008, 5:36:34 PM, you wrote: > > CC> Lu Wen-yan wrote: >>> Hello linux-cluster, >>> >>> Oct 26 18:45:08 cms2 openais[13904]: [SYNC ] This node is within the primary component and will provide service. >>> Oct 26 18:45:08 cms2 openais[13904]: [TOTEM] entering OPERATIONAL state. >>> Oct 26 18:45:08 cms2 openais[13904]: [CLM ] got nodejoin message 192.168.201.1 >>> Oct 26 18:45:08 cms2 openais[13904]: [CLM ] got nodejoin message 192.168.201.2 >>> Oct 26 18:45:08 cms2 openais[13904]: [CPG ] got joinlist message from node 1 >>> [b][color=Red]Oct 26 18:45:08 cms2 openais[13904]: [CMAN ] cman killed by node 2 for reason 2 >>> >>> I get an error msg when I restart cman. Anyone know what is " reason 2 " ? >>> Reason "2" is that someone issued a cman_tool kill command on another node. So it's nothing wrong with the cluster that has caused that message. I do strongly recommend you upgrade. There have been a substantial number of fixes to all aspects of cluster suite since RHEL 5.0. > CC> It means you have a very old version of cman that needs updating ;-) > CC> That message as from 5.0 and lots of things have been fixed (including > CC> that error) since then . > > CC> Chrissie > > > -- Chrissie From afahounko at gmail.com Tue Oct 28 15:27:24 2008 From: afahounko at gmail.com (AFAHOUNKO Danny) Date: Tue, 28 Oct 2008 15:27:24 +0000 Subject: [Linux-cluster] Cluster Two nodes - Software Installation Message-ID: <49072F5C.4060002@gmail.com> Hi, I'm newbees in Clustering. I've installed a cluster with two nodes without a share storage. I want i know if it's possible to install a software (apache, exim,...) once, and it will be automaticaly deployed on the two nodes ?! I'm using RedHat 5.1 Advanced Plateform with RedHat Cluster Suite. Thanks for helps. -- Cordialement AFAHOUNKO Danny Administrateur R?seaux & Syst?me d'Information - CICA-RE Gsm: +228 914.55.89 Tel: +228 223.62.62 From raju.rajsand at gmail.com Tue Oct 28 17:03:28 2008 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Tue, 28 Oct 2008 22:33:28 +0530 Subject: [Linux-cluster] Cluster Two nodes - Software Installation In-Reply-To: <49072F5C.4060002@gmail.com> References: <49072F5C.4060002@gmail.com> Message-ID: <8786b91c0810281003w7aa893f3h9a5e71f16f6ea81d@mail.gmail.com> Greetings On Tue, Oct 28, 2008 at 8:57 PM, AFAHOUNKO Danny wrote: > I'm newbees in Clustering. I've installed a cluster with two nodes without > a share storage. > I want i know if it's possible to install a software (apache, exim,...) > once, and it will be automaticaly deployed on the two nodes ?! > I'm using RedHat 5.1 Advanced Plateform with RedHat Cluster Suite. > On the face of it, No. C'mon, How can two different OS images find the same binary when the storage is not available in shared mode? (NFS, one of the possible options in your case, is considered Shared Storage, But then it could be a pain to configure to do what you want) But it is just not worth it Assuming both the nodes are identical in hardware, why not install all the packages on say node 1. using yumand then copy the /var/cache/yum from that node and do a yum install in node 2. This may not be painful for small number of nodes but can can get unwieldy if cluster nodes are large. In which case one should contemplete using kickstart _before_ RHEL install. IOW, Plan properly HTH Regards Rajagopal -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Tue Oct 28 17:32:07 2008 From: gordan at bobich.net (Gordan Bobic) Date: Tue, 28 Oct 2008 17:32:07 +0000 Subject: [Linux-cluster] Cluster Two nodes - Software Installation Message-ID: <4906F1690017E8BB@> (added by postmaster@mail.o2.co.uk) You can set two machines up with shared root storage using Open Shared Root From jds at techma.com Tue Oct 28 20:40:52 2008 From: jds at techma.com (Simmons, Dan A) Date: Tue, 28 Oct 2008 16:40:52 -0400 Subject: [Linux-cluster] Cluster with kernel-smp nodes and hugemem nodes Message-ID: <79CEFE3C5C43714D9170E3138DC09935A36895@TMAEMAIL.techma.com> Hi All, I have a 12 node Redhat 4.7 cluster and I want to run 3 nodes with the hugemem kernel while keeping the rest of the nodes running the smp kernel. Is there anything I have to worry about if I do this? J. Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodrique.heron at baruch.cuny.edu Tue Oct 28 22:45:05 2008 From: rodrique.heron at baruch.cuny.edu (Rodrique Heron) Date: Tue, 28 Oct 2008 18:45:05 -0400 Subject: [Linux-cluster] Multiple network path for cluster traffic Message-ID: <20081028224115.E98BF15EC27@smtp25.baruch.cuny.edu> Hello all- Is it necessary to provide redundant paths for cluster traffic? My server as six network interface, I would like to dedicate two for cluster traffic, both interfaces will be connected to separate switches. Is there a recommended way of setting this up so I can restrict all cluster traffic through the two interfaces? Should I bond both interfaces? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.costakos at gmail.com Wed Oct 29 01:22:50 2008 From: david.costakos at gmail.com (Dave Costakos) Date: Tue, 28 Oct 2008 18:22:50 -0700 Subject: [Linux-cluster] Cluster/GFS issue. In-Reply-To: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> Message-ID: <6b6836c60810281822g554650c8y5a54fcfec9ea9520@mail.gmail.com> Usually, when I hear about problems like this, it is often a multicast issue -- at least from my experience. Can you confirm that cman is able to talk on your multicast address? If not, I suggest specifying a multicast address in the 224.0.0.111 - 250. This will require that the whole cluster be reset. I would avoid putting startup commands in /etc/rc.local as some suggest -- seems like a red herring to me. The init scripts should work fine (they do for me on our 3 8-node clusters. -Dave. 2008/10/23 Allgood, John > Hello All > > > > I am having some issues with building an eight node Xen cluster. Let me > give some background first. We have 8 dell PE 1950 with 32GB RAM connected > via dual brocade fiber switchs to an EMC CX-310. The guests images are being > stored on the SAN. We are using EMC Powerpath to hand the multipathing. The > Operating system is Redhat Advanced Platform 5.2 . The filesystems on the > SAN were created using Conga CLVM/GFS1. We have the heartbeat on an separate > private network. The fence devices are Dell DRAC's. > > Here is the problem that we are having. We can't on an consistent basic > get the GFS filesystem mounted. On the nodes that don't connect it will just > hang on bootup trying to mount the GFS filesystem. All nodes come up and > join the cluster at this point but only 1 or 2 will completely come up with > the GFS filesystem mounted. If we do an interactive startup and skip the GFS > part all systems will come up on the cluster but without the gfs mounted. > > At this point I am not sure what to do next. I am thinking it may be a > problem with the way the GFS filesystem was created. We just used the > default settings. The LVM is 668GB created from an RAID10. > > > > Best Regards > > > > *John Allgood** > **Senior Systems Administrator** > **Turbo, division of OHL** > **2251 Jesse Jewell Pky. NE** > **Gainesville, GA 30507** > **tel: (678) 989-3051 fax: (770) 531-7878** > **jallgood at ohl.com* > > *www.ohl.com*** > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Dave Costakos mailto:david.costakos at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From raju.rajsand at gmail.com Wed Oct 29 04:58:07 2008 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Wed, 29 Oct 2008 10:28:07 +0530 Subject: [Linux-cluster] Cluster Two nodes - Software Installation In-Reply-To: <7378413047924619566@unknownmsgid> References: <7378413047924619566@unknownmsgid> Message-ID: <8786b91c0810282158l3aef4edfq322eece085cd1e1f@mail.gmail.com> On Tue, Oct 28, 2008 at 11:02 PM, Gordan Bobic wrote: > You can set two machines up with shared root storage using Open Shared Root > AFAIK, the prerequisite for Open Shared Root is a shared storage from URL: http://www.open-sharedroot.org/documentation/the-opensharedroot-mini-howto#prerequesits [quote] 1. You should have at least two servers connected to some kind of storage network. Both servers need to have concurrent access to at least one better two logical units (LUNS). [unquote] So, IMHO, without some type of storage accessible to both nodes, as is the case quoted in the original post, It is impossible. Regards, Rajagopal -------------- next part -------------- An HTML attachment was scrubbed... URL: From raju.rajsand at gmail.com Wed Oct 29 05:07:06 2008 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Wed, 29 Oct 2008 10:37:06 +0530 Subject: [Linux-cluster] Cluster/GFS issue. In-Reply-To: <6b6836c60810281822g554650c8y5a54fcfec9ea9520@mail.gmail.com> References: <82E499DEBAB95F4E91140984379FB1C6046DD7@NOC-ML-09.ohlogistics.com> <6b6836c60810281822g554650c8y5a54fcfec9ea9520@mail.gmail.com> Message-ID: <8786b91c0810282207y44d40a5brf150cc1b6c97d90c@mail.gmail.com> Greetings, 2008/10/29 Dave Costakos > Usually, when I hear about problems like this, it is often a multicast > issue -- at least from my experience. > > Yes, that is one possibility that must be checked. > I would avoid putting startup commands in /etc/rc.local as some suggest -- > seems like a red herring to me. Trust me,it is not a Red Herring. This method worked for a three node cluster (one node had a different configuration) as this ensures that any device drivers (Like SAS DAS box Which I came across once) which are not "burnt" into initrd but are later loaded when the full system is booted. The rc.local method worked reliably compared to the /etc/fstab entries. But then YMMV. Regards, Rajagopal -------------- next part -------------- An HTML attachment was scrubbed... URL: From raju.rajsand at gmail.com Wed Oct 29 07:01:08 2008 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Wed, 29 Oct 2008 12:31:08 +0530 Subject: [Linux-cluster] cmirror Message-ID: <8786b91c0810290001h1b9c9534k947a9e8e8299d151@mail.gmail.com> Greetings, Could somebody point to some introductory material/doc on cmirror please Regards Rajagopal -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmacfarland at nexatech.com Wed Oct 29 13:07:45 2008 From: jmacfarland at nexatech.com (Jeff Macfarland) Date: Wed, 29 Oct 2008 08:07:45 -0500 Subject: [Linux-cluster] Multiple network path for cluster traffic In-Reply-To: <20081028224115.E98BF15EC27@smtp25.baruch.cuny.edu> References: <20081028224115.E98BF15EC27@smtp25.baruch.cuny.edu> Message-ID: <49086021.3080505@nexatech.com> Rodrique Heron wrote: > Hello all- > > Is it necessary to provide redundant paths for cluster traffic? > > My server as six network interface, I would like to dedicate two for > cluster traffic, both interfaces will be connected to separate switches. > Is there a recommended way of setting this up so I can restrict all > cluster traffic through the two interfaces? Should I bond both interfaces? > > Thanks > Red Hat clustering currently only supports once interface for cluster traffic. If you want to use multiple interfaces, you must use bonding. -- Jeff Macfarland (jmacfarland at nexatech.com) Nexa Technologies - 972.747.8879 Systems Administrator GPG Key ID: 0x5F1CA61B GPG Key Server: hkp://wwwkeys.pgp.net From jralph at intertechmedia.com Thu Oct 30 17:37:14 2008 From: jralph at intertechmedia.com (Jason Ralph) Date: Thu, 30 Oct 2008 13:37:14 -0400 Subject: [Linux-cluster] Data Loss / Files and Folders "2-Node_GFS-Cluster" Message-ID: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com> Hello List, We currently have in production a two node cluster with a shared SAS storage device. Both nodes are running RHEL5 AP and are connected directly to the storage device via SAS. We also have configured a high availability NFS service directory that is being exported out and is mounted on multiple other linux servers. The problem that I am seeing is: FIle and folders that are using the GFS filesystem and live on the storage device are mysteriously getting lost. My first thought was that maybe one of our many users has deleted them. So I have revoked the users privilleges and it is still happening. My other tought was that a rsync script may have overwrote these files or deleted them. I have stopped all scripting and crons and it has happened again. Can someone help me with a command or a log to view that would show me where any of these folders may have gone? Or has anyone else ever run into this type of data loss using the similar setup? Regards, -- Jason R. Ralph Systems Administrator Intertech Media LLC 20 Summer Street - Floor 5 Stamford CT 06901 (203) 967 - 1800 x 122 jralph at intertechmedia.com This transmittal may be a confidential communication or may otherwise be privileged or confidential. If it is not clear that you are the intended recipient, you are hereby notified that you have received this transmittal in error; any review, dissemination, distribution or copying of this transmittal is strictly prohibited. If you suspect that you have received this communication in error, please notify us immediately by telephone at 1-203-967-1800 x 114, or e-mail at it at intertechmedia.com and immediately delete this message and all its attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.zg at gmail.com Thu Oct 30 20:57:29 2008 From: alan.zg at gmail.com (Alan A) Date: Thu, 30 Oct 2008 15:57:29 -0500 Subject: [Linux-cluster] APC Power switch question Message-ID: Hello everyone! I have a few short questions. We just acquired 2 APC Power Switches. Our clustered servers have two power supplies so each APC switch supplies/supports one server power supply. Example: dev02 power supply 1 - APC switch 1 dev02 power supply 2 - APC switch 2 Question: I am trying to complete CONGA setup - and all is clear in the first box: Name - got it IP - got it Login - got it Password got it What I do not understand is what is: 'port' stand for - is that the port fence_apc is connecting to APC power switch - or is that the number of the outlet. What is switch(optional) mean? I repeat this is in CONGA! Thanks for the fast help. -- Alan A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jparsons at redhat.com Thu Oct 30 21:32:02 2008 From: jparsons at redhat.com (jim parsons) Date: Thu, 30 Oct 2008 17:32:02 -0400 Subject: [Linux-cluster] APC Power switch question In-Reply-To: References: Message-ID: <1225402322.3319.4.camel@localhost.localdomain> On Thu, 2008-10-30 at 15:57 -0500, Alan A wrote: > > Hello everyone! > > I have a few short questions. We just acquired 2 APC Power Switches. > Our clustered servers have two power supplies so each APC switch > supplies/supports one server power supply. Example: > dev02 power supply 1 - APC switch 1 > dev02 power supply 2 - APC switch 2 > > Question: > I am trying to complete CONGA setup - and all is clear in the first > box: > Name - got it > IP - got it > Login - got it > Password got it > > What I do not understand is what is: 'port' stand for - is that the > port fence_apc is connecting to APC power switch - or is that the > number of the outlet. It is the outlet number on the switch...or the name of the outlet if you have assigned a name to it using the APC firmware application > What is switch(optional) mean? Certain APC switch models can be ganged together. If you are using the switches standalone (which you are, it seems from the above) just leave this field blank. -j From alan.zg at gmail.com Thu Oct 30 21:38:34 2008 From: alan.zg at gmail.com (Alan A) Date: Thu, 30 Oct 2008 16:38:34 -0500 Subject: [Linux-cluster] APC Power switch question In-Reply-To: <1225402322.3319.4.camel@localhost.localdomain> References: <1225402322.3319.4.camel@localhost.localdomain> Message-ID: Thanks for the answer. I have actually named switches in a 3 node cluster and will set them up accordingly. THis is how cluster.conf looks like, I am finishing the setup for dev03. On Thu, Oct 30, 2008 at 4:32 PM, jim parsons wrote: > On Thu, 2008-10-30 at 15:57 -0500, Alan A wrote: > > > > Hello everyone! > > > > I have a few short questions. We just acquired 2 APC Power Switches. > > Our clustered servers have two power supplies so each APC switch > > supplies/supports one server power supply. Example: > > dev02 power supply 1 - APC switch 1 > > dev02 power supply 2 - APC switch 2 > > > > Question: > > I am trying to complete CONGA setup - and all is clear in the first > > box: > > Name - got it > > IP - got it > > Login - got it > > Password got it > > > > What I do not understand is what is: 'port' stand for - is that the > > port fence_apc is connecting to APC power switch - or is that the > > number of the outlet. > It is the outlet number on the switch...or the name of the outlet if you > have assigned a name to it using the APC firmware application > > What is switch(optional) mean? > Certain APC switch models can be ganged together. If you are using the > switches standalone (which you are, it seems from the above) just leave > this field blank. > > -j > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Alan A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pronix.service at gmail.com Thu Oct 30 22:13:09 2008 From: pronix.service at gmail.com (pronix pronix) Date: Thu, 30 Oct 2008 22:13:09 -0000 Subject: [Linux-cluster] Can clustered RHEL 5 use a SAN with different access rights for different nodes in the cluster? In-Reply-To: References: Message-ID: <639ce0480806171423u4503665ewd7426080145309ea@mail.gmail.com> yes , you can deploy than without gfs,but with gfs2 better readonly access implement by anonymous (read only) users. failover possible create - enough 2 nodes and drbd 2008/6/18 Richard Williams - IoDynamix : > Please advise and/or redirect this posting if this is not the correct forum > for my question - thanks. > > A company wants to use clustered rhel5 systems as inside/outside ftp > servers. Users on the inside (LAN) cluster nodes can read and write to the > SAN, while users on the outside (DMZ) cluster can only read. > > Is this application possible without GFS? > > If one node in the cluster fails, can the other node be provisioned to > provide all services until recovery? > > Can a SAN be used as the "single" ftp location for both services (inside > FTP > & outside FTP?) > > Does the customer need more than four systems (i.e. 2 inside - 2 outside) - > is a separate "command" system required? > > > Have Dell's m1000e & 600 series blades been certified for this operating > system? > > Is there any documentation available regarding separate access rights for > multiple nodes in a cluster available? > > Thanks for your constructive reply. > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.wendy.cheng at gmail.com Fri Oct 31 02:02:00 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 30 Oct 2008 21:02:00 -0500 Subject: [Linux-cluster] Data Loss / Files and Folders "2-Node_GFS-Cluster" In-Reply-To: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com> References: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com> Message-ID: <490A6718.8000700@gmail.com> Jason Ralph wrote: > Hello List, > > We currently have in production a two node cluster with a shared SAS > storage device. Both nodes are running RHEL5 AP and are connected > directly to the storage device via SAS. We also have configured a > high availability NFS service directory that is being exported out and > is mounted on multiple other linux servers. > > The problem that I am seeing is: > FIle and folders that are using the GFS filesystem and live on the > storage device are mysteriously getting lost. My first thought was > that maybe one of our many users has deleted them. So I have revoked > the users privilleges and it is still happening. My other tought was > that a rsync script may have overwrote these files or deleted them. I > have stopped all scripting and crons and it has happened again. > > Can someone help me with a command or a log to view that would show me > where any of these folders may have gone? Or has anyone else ever run > into this type of data loss using the similar setup? > I don't (or "didn't") have adequate involvements with RHEL5 GFS. I may not know enough to response. However, ...... Before RHEL 5.1 and/or community version 2.6.22 kernels, NFS lock (via flock, fcntl, etc from client ends) is not populated into filesystem layer. It only reaches Linux VFS layer (local to one particular server). If your file access needs to get synchronized by either flock or posix fcntl *between multiple hosts (NFS servers)*, data loss could occur. Newer versions of RHEL and 2.6.22-and-after kernels should have the fixes. There was an old write-up in section 4.1 of "http://people.redhat.com/wcheng/Project/nfs.htm" about this issue. -- Wendy From fdinitto at redhat.com Fri Oct 31 08:27:06 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Fri, 31 Oct 2008 09:27:06 +0100 (CET) Subject: [Linux-cluster] Cluster 2.99.12 (development snapshot) released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its community are proud to announce the 2.99.12 release from the master branch. Important note: If you are running 2.99.xx series, please upgrade immediatly to this version. This release addresses several security issues. The development cycle for 3.0 is proceeding at a very good speed and mostlikely one of the next releases will be 3.0alpha1. All features designed for 3.0 are being completed and taking a proper shape, the library API has been stable for sometime (and will soon be marked as 3.0 soname). Stay tuned for upcoming updates! The 2.99.XX releases are _NOT_ meant to be used for production environments.. yet. The master branch is the main development tree that receives all new features, code, clean up and a whole brand new set of bugs, At some point in time this code will become the 3.0 stable release. Everybody with test equipment and time to spare, is highly encouraged to download, install and test the 2.99 releases and more important report problems. In order to build the 2.99.11 release you will need: - - corosync svn r1677 (porting to newer corosync is in progress). - - openais svn r1656. - - linux kernel (2.6.27) The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.12.tar.gz https://fedorahosted.org/releases/c/l/cluster/cluster-2.99.12.tar.gz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.99.11): Christine Caulfield (1): cman: fix two_node startup if -e is specified David Teigland (4): dlm_controld: fix plock dump groupd/fenced/dlm_controld/gfs_controld: init logging after fork gfs_controld: move log_error message fenced/dlm_controld/gfs_controld: query thread mutex Fabio M. Di Nitto (40): misc: cleanup copyright.... again misc: fix gfs2_edit build fence: update man page for fence_apc gfs2: randomize debugfs mount point gfs2: randomize file for savemeta operations gfs2: remove unused define rgmanager: randomize file for automatic data dump rgmanager: randomize ASEHAagent temp files rgmanager: move fs.sh log file where they belong rgmanager: move nfsclient.sh cache files where they belong rgmanager: move oracledb.sh log files where they belong build: reinstate targets in rgmanager metadata check rgmanager: randomize SAPDatabase temp file libgfs2: randomize creation of temporary directories for metafs mount xmlconfig: remove debugging fprintf ccs: implement config reload in legacy ccs cman: add /libccs/@next_handle support ccs: libccs major rework pass 1 ccs: libccs split ccs_lookup_nodename into extras.c ccs: libccs major rework pass 2 ccs: libccs major rework pass 3 ccs: libccs major rework pass 4 ccs: remove duplicate entry in internal header file ccs: libccs major rework pass 5 common: plug liblogthread in the system build: use standard syslog priority name rather than corosync ccs: add ccs_read_logging ccsais: fix buffer overflow when reading huge config files xmlconfig: fix buffer overflow when reading huge config files ccs: cleanup ccs_read_logging gfs2: randomize debugfs mount point even more gfs2: randomize file for savemeta operations even more rgmanager: move state dump file where it belongs rgmanager: randomize ASEHAagent temp files even more rgmanager: randomize SAPDatabase temp file even more rgmanager: randomize oracledb.sh temp file misc: fix mktemp usage rgmanager: randomize smb.sh temp file rgmanager: randomize svclib_nfslock temp dir gfs2: randomize creation of temporary directories for metafs mount more Jan Friesse (6): [fence] Fence agent for ePowerSwitch 8M+ (fence_eps) [fence] Fixed man pages makefile, so fence_eps.8 is now installed. fence: Added support for no_password in fence agents library and fence_eps. fence: Fixed case sensitives in action parameter. fence: Fix -C switch description in Python library fence: Operation 'list' and 'monitor' for Alom, LDOM, VMware and ePowerSwitch Jim Meyering (8): don't dereference NULL upon failed realloc * fence/agents/xvm/ip_lookup.c (add_ip): Handle malloc failure. * gfs/gfs_fsck/inode.c (check_inode): handle failed malloc remove dead code (useless test of memset return value) add comments marking unchecked malloc calls Remove unused local variable, buf, add comments marking unchecked strdup calls handle some malloc failures Jonathan Brassow (1): rgmanager (HALVM): Stop dumping debug output to /tmp Marek 'marx' Grac (4): [fence] Operation 'list' and 'monitor' for iLO, DRAC5 and APC [fence] Operation 'list' and 'monitor' for WTI IPS 800-CE [fence] WTI should not power on/off plug if it is unable to get status [fence] WTI should not power on/off plug if it is unable to get status Simone Gotti (1): [rgmanager] Fix fuser parsing on later versions of psmisc Makefile | 4 +- cman/daemon/cman-preconfig.c | 17 + cman/daemon/cmanconfig.c | 44 +- common/Makefile | 4 + common/liblogthread/Makefile | 13 + common/liblogthread/liblogthread.c | 222 +++++ common/liblogthread/liblogthread.h | 17 + config/libs/libccsconfdb/Makefile | 7 +- config/libs/libccsconfdb/ccs.h | 5 + config/libs/libccsconfdb/ccs_internal.h | 29 + config/libs/libccsconfdb/extras.c | 382 ++++++++ config/libs/libccsconfdb/fullxpath.c | 334 +++++++ config/libs/libccsconfdb/libccs.c | 1181 ++++++++--------------- config/libs/libccsconfdb/xpathlite.c | 424 ++++++++ config/plugins/ccsais/config.c | 28 +- config/plugins/ldap/configldap.c | 11 - config/plugins/xml/config.c | 100 +- config/tools/ccs_tool/editconf.c | 1 + config/tools/ldap/confdb2ldif.c | 9 - configure | 25 +- doc/COPYRIGHT | 9 +- fence/agents/alom/fence_alom.py | 2 +- fence/agents/apc/fence_apc.py | 4 +- fence/agents/apc_snmp/fence_apc_snmp.py | 2 +- fence/agents/baytech/fence_baytech.pl | 2 +- fence/agents/bladecenter/Makefile | 13 - fence/agents/bladecenter/fence_bladecenter.py | 3 +- fence/agents/drac/fence_drac5.py | 2 +- fence/agents/eps/Makefile | 5 + fence/agents/eps/fence_eps.py | 112 +++ fence/agents/ilo/fence_ilo.py | 2 +- fence/agents/ldom/fence_ldom.py | 34 +- fence/agents/lib/fencing.py.py | 53 +- fence/agents/rsa/fence_rsa.py | 18 +- fence/agents/rsb/fence_rsb.py | 18 +- fence/agents/vmware/fence_vmware.py | 66 +- fence/agents/wti/fence_wti.py | 16 +- fence/agents/xcat/fence_xcat.pl | 2 + fence/agents/xvm/ip_lookup.c | 2 + fence/fenced/main.c | 13 +- fence/fenced/recover.c | 33 +- fence/man/Makefile | 3 +- fence/man/fence_alom.8 | 10 +- fence/man/fence_apc.8 | 8 +- fence/man/fence_baytech.8 | 4 +- fence/man/fence_eps.8 | 106 ++ fence/man/fence_ibmblade.8 | 2 +- fence/man/fence_rsa.8 | 4 +- fence/man/fence_rsb.8 | 4 +- fence/man/fence_vmware.8 | 10 +- gfs-kernel/src/gfs/lm_interface.h | 9 - gfs-kernel/src/gfs/lock_dlm.h | 9 - gfs-kernel/src/gfs/lock_dlm_lock.c | 9 - gfs-kernel/src/gfs/lock_dlm_main.c | 9 - gfs-kernel/src/gfs/lock_dlm_mount.c | 9 - gfs-kernel/src/gfs/lock_dlm_sysfs.c | 9 - gfs-kernel/src/gfs/lock_dlm_thread.c | 9 - gfs-kernel/src/gfs/lock_nolock_main.c | 9 - gfs-kernel/src/gfs/locking.c | 9 - gfs/gfs_fsck/block_list.c | 5 +- gfs/gfs_fsck/fs_dir.c | 4 + gfs/gfs_fsck/inode.c | 6 +- gfs/gfs_fsck/super.c | 3 + gfs/libgfs/fs_dir.c | 4 + gfs/libgfs/inode.c | 1 + gfs/libgfs/super.c | 1 + gfs/tests/filecon2/filecon2_server.c | 2 +- gfs2/edit/hexedit.c | 3 +- gfs2/edit/hexedit.h | 4 +- gfs2/edit/savemeta.c | 19 +- gfs2/fsck/initialize.c | 1 + gfs2/libgfs2/libgfs2.h | 4 - gfs2/libgfs2/misc.c | 117 +-- gfs2/libgfs2/super.c | 1 + gfs2/mkfs/main_grow.c | 4 +- gfs2/mkfs/main_jadd.c | 7 +- gfs2/quota/check.c | 12 +- gfs2/quota/gfs2_quota.h | 3 - gfs2/quota/main.c | 25 +- gfs2/tool/df.c | 4 +- gfs2/tool/misc.c | 36 +- gnbd/tools/gnbd_export/gnbd_export.c | 4 + group/daemon/app.c | 3 + group/daemon/cpg.c | 2 + group/daemon/joinleave.c | 1 + group/daemon/main.c | 7 +- group/dlm_controld/deadlock.c | 10 +- group/dlm_controld/main.c | 9 +- group/dlm_controld/plock.c | 3 + group/gfs_controld/cpg-new.c | 22 +- group/gfs_controld/crc.c | 12 - group/gfs_controld/main.c | 10 +- group/gfs_controld/plock.c | 2 + make/copyright.cf | 2 +- make/defines.mk.input | 3 + rgmanager/src/daemons/clurmtabd_lib.c | 1 + rgmanager/src/daemons/dtest.c | 2 + rgmanager/src/daemons/main.c | 4 +- rgmanager/src/resources/ASEHAagent.sh | 11 +- rgmanager/src/resources/Makefile | 33 +- rgmanager/src/resources/SAPDatabase | 2 +- rgmanager/src/resources/clusterfs.sh | 4 +- rgmanager/src/resources/fs.sh | 1304 ------------------------- rgmanager/src/resources/fs.sh.in | 1304 +++++++++++++++++++++++++ rgmanager/src/resources/lvm_by_vg.sh | 2 +- rgmanager/src/resources/netfs.sh | 4 +- rgmanager/src/resources/nfsclient.sh | 7 +- rgmanager/src/resources/oracledb.sh | 887 ----------------- rgmanager/src/resources/oracledb.sh.in | 888 +++++++++++++++++ rgmanager/src/resources/smb.sh | 2 +- rgmanager/src/resources/svclib_nfslock | 3 +- 111 files changed, 4805 insertions(+), 3509 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSQrBYQgUGcMLQ3qJAQIk0BAAlZFrEgXWy8RxvHrXuvScIgutOjd+Bjgj 2cPdoaFZjeLSroWifZJNHjjYfSG/FcpZug/NJxall3xVicmwc/CUljJtgqRJeN6x 0VWTFyC7GJrg6pnzEnTyriggBpaGDZZgnbLisV2gmIqFuDmiEVHAqnoYWl+dU9dj xaPq01LrVXzYhVb18DYqglCWl5LmHQQyTmhDh5pvUwbwZd/fsdr4WI7gkcgxA1Uw Hy+pbVMWIgRTBH+YEDH2j28pynvaNLvUopLBPHFLGY971vLhYldGzUmQubm04J7O ocQl0Q9qxuSVCqrCIpQ/Ty+V0x0begzahaczccdJAXVyxti2owKS4FX8OqLQPHo0 plFIx4g8hJSxX4zgfh/P7Fb48ePlGN6WE07o/2mO1vplfEOpnQ2xoWYFsDCaoSjO W2bETI+xT+E+UpKTI0w1j5/mfo/8kJ79WmDlZZujuwrM6/1iMJVTWbffqZkbGMcj ukl0B3q5VkFo4NOTtZJHOUfhC3+2QglfyhT09Fxhp1eqMiAFZDWWqEQxHC7dbtAv xu8KRCQiR4hVEZLNnLaoAIYlWABVAz1Ltux52uDFuul/jusxDpqjlp1cT54+j+ss h1wwlxgyFyisYCXxnAiRkECKjttcOG4FrVAA4k3fOl8u6F0Suw+GJdS2cESeXtce D2lFBpyTapQ= =JxFs -----END PGP SIGNATURE----- From fdinitto at redhat.com Fri Oct 31 08:58:52 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Fri, 31 Oct 2008 09:58:52 +0100 (CET) Subject: [Linux-cluster] Cluster 2.03.09 released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its vibrant community are proud to announce the 2.03.09 release from the STABLE2 branch. The STABLE2 branch collects, on a daily base, all bug fixes and the bare minimal changes required to run the cluster on top of the most recent Linux kernel (2.6.27) and rock solid openais (0.80.3). This release addresses several security issues. Please consider upgrading as soon as possible. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.09.tar.gz https://fedorahosted.org/releases/c/l/cluster/cluster-2.03.09.tar.gz To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.03.08): Christine Caulfield (3): dlmtop: Add usage message cman: fix two_node startup if -e is specified dlmtop: fix some typos. Fabio M. Di Nitto (26): misc: cleanup copyright.... again... and again... misc: fix gfs2_edit build cman: fix buffer overflow when reading huge config files fence: update man page for fence_apc gfs2: randomize debugfs mount point gfs2: randomize debugfs mount point even more gfs2: randomize file for savemeta operations gfs2: randomize file for savemeta operations even more gfs2: remove unused define rgmanager: randomize file for automatic data dump rgmanager: move state dump file where it belongs rgmanager: randomize ASEHAagent temp files rgmanager: randomize ASEHAagent temp files even more rgmanager: move fs.sh log file where they belong rgmanager: move nfsclient.sh cache files where they belong rgmanager: move oracledb.sh log files where they belong build: reinstate targets in rgmanager metadata check rgmanager: randomize SAPDatabase temp file rgmanager: randomize SAPDatabase temp file even more libgfs2: randomize creation of temporary directories for metafs mount rgmanager: randomize oracledb.sh temp file misc: fix mktemp usage rgmanager: randomize smb.sh temp file rgmanager: randomize svclib_nfslock temp dir ccs_tool: randomize temporary file gfs2: randomize creation of temporary directories for metafs mount more Jan Friesse (4): [fence] Fence agent for ePowerSwitch 8M+ (fence_eps) [fence] Fixed man pages makefile, so fence_eps.8 is now installed. fence: Added support for no_password in fence agents library and fence_eps. fence: Fixed case sensitives in action parameter. Jonathan Brassow (1): rgmanager (HALVM): Stop dumping debug output to /tmp Marek 'marx' Grac (2): [fence] WTI should not power on/off plug if it is unable to get status [fence] WTI should not power on/off plug if it is unable to get status ccs/ccs_tool/upgrade.c | 7 +- cman/daemon/ais.c | 8 +- cman/daemon/cmanccs.c | 70 +- cman/daemon/config.c | 49 +- config/copyright.cf | 2 +- dlm/tests/tcpdump/dlmtop.c | 84 ++- fence/agents/apc_snmp/README | 2 - fence/agents/apc_snmp/fence_apc_snmp.py | 2 +- fence/agents/baytech/fence_baytech.pl | 2 +- fence/agents/eps/Makefile | 5 + fence/agents/eps/fence_eps.py | 108 +++ fence/agents/lib/fencing.py.py | 27 +- fence/agents/lpar/Makefile | 13 - fence/agents/lpar/fence_lpar.py | 3 +- fence/agents/rsa/fence_rsa.py | 18 +- fence/agents/rsb/fence_rsb.py | 18 +- fence/agents/vmware/fence_vmware.py | 3 +- fence/agents/xcat/fence_xcat.pl | 2 + fence/man/Makefile | 3 +- fence/man/fence_alom.8 | 10 +- fence/man/fence_apc.8 | 8 +- fence/man/fence_baytech.8 | 4 +- fence/man/fence_eps.8 | 106 +++ fence/man/fence_ibmblade.8 | 2 +- fence/man/fence_rsa.8 | 4 +- fence/man/fence_rsb.8 | 4 +- fence/man/fence_vmware.8 | 10 +- gfs-kernel/src/gfs/lm_interface.h | 9 - gfs-kernel/src/gfs/lock_dlm.h | 9 - gfs-kernel/src/gfs/lock_dlm_lock.c | 9 - gfs-kernel/src/gfs/lock_dlm_main.c | 9 - gfs-kernel/src/gfs/lock_dlm_mount.c | 9 - gfs-kernel/src/gfs/lock_dlm_sysfs.c | 9 - gfs-kernel/src/gfs/lock_dlm_thread.c | 9 - gfs-kernel/src/gfs/lock_nolock_main.c | 9 - gfs-kernel/src/gfs/locking.c | 9 - gfs2/edit/hexedit.c | 2 +- gfs2/edit/hexedit.h | 4 +- gfs2/edit/savemeta.c | 15 +- gfs2/libgfs2/libgfs2.h | 4 - gfs2/libgfs2/misc.c | 117 +--- gfs2/mkfs/main_grow.c | 4 +- gfs2/mkfs/main_jadd.c | 7 +- gfs2/quota/check.c | 12 +- gfs2/quota/gfs2_quota.h | 3 - gfs2/quota/main.c | 25 +- gfs2/tool/df.c | 4 +- gfs2/tool/misc.c | 36 +- rgmanager/src/daemons/main.c | 4 +- rgmanager/src/resources/ASEHAagent.sh | 11 +- rgmanager/src/resources/Makefile | 33 +- rgmanager/src/resources/SAPDatabase | 2 +- rgmanager/src/resources/fs.sh | 1304 ------------------------------- rgmanager/src/resources/fs.sh.in | 1304 +++++++++++++++++++++++++++++++ rgmanager/src/resources/lvm_by_vg.sh | 2 +- rgmanager/src/resources/nfsclient.sh | 7 +- rgmanager/src/resources/oracledb.sh | 888 --------------------- rgmanager/src/resources/oracledb.sh.in | 889 +++++++++++++++++++++ rgmanager/src/resources/smb.sh | 2 +- rgmanager/src/resources/svclib_nfslock | 3 +- 60 files changed, 2732 insertions(+), 2605 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSQrI0QgUGcMLQ3qJAQJWPw/8Cx56Zw5mdKNUBTHqgGEF5UTySV3GLMMq KiLS8L2sAaKvAjP2hqntTJn+iKBQdH02hCLo0PLDKdXwY8TSFY8Ryu04340ElUKw ShnC6mxKs0Kc44X+jUiqG4gH7zEoeW0KdW514NFdyY41Jd7X6IXmIyGDgE+kCxjm T/n3HJv+3sNyCYbtHMBnnCnXj4e5Bp9lj5Dd0u0QiJWunDucX4x5DQrDZF7SmUaF QnyIEDU3AUB6TI2Yzg24BuWhXX4upaBX9LGOVn0Y4sLappZqrI/RFgN1A05mXnsd fRQWRDjsDcKBlU/+YKKNZaE2uefVHzshza0VOxlvqFtEbbDmjIRv+Bkw7L51C/nG Vxe4xNvXukg8GhiZCsPsP3Iv84nJaLnHkS1JqKAf8iZRfHGvlHXmzYBj462j+T/i RrpF3qmcCiwz12HI+MUkCNgkbVTA3LagSZKbiB1AYFWA+I+vksBTD1d9VgYSUIub vrrn2IhpsSRVbAsvVGO4lCGZJYRNza/d6c3bi8O0GG7JjN2I4ucGZs3yCgyjei1O 1bJSIxhL0COWhmJYaZnwhll1mYQ9td+BTu4BzF2Wd1NE94G2wE+/OnT4Xu4xzQiK Wse4BjuezGWbjooG0BLpAnZbiZfOHZnUGNMAlrkTELtg2c9ed3vvYZy4jdkmoxcY zkdU2QK+8xs= =Fqdo -----END PGP SIGNATURE----- From pk at nodex.ru Fri Oct 31 10:52:12 2008 From: pk at nodex.ru (Pavel Kuzin) Date: Fri, 31 Oct 2008 13:52:12 +0300 Subject: [Linux-cluster] Building error in Cluster 2.03.09 References: Message-ID: <0d5f01c93b46$baf5a4e0$a401a8c0@mainoffice.nodex.ru> Hello! I`m tryig to build cluster 2.03.09 against linux 2.6.27.4. When building have a error: upgrade.o: In function `upgrade_device_archive': /root/newcluster/cluster-2.03.09/ccs/ccs_tool/upgrade.c:226: undefined reference to `mkostemp' collect2: ld returned 1 exit status make[2]: *** [ccs_tool] Error 1 make[2]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs/ccs_tool' make[1]: *** [all] Error 2 make[1]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs' make: *** [ccs] Error 2 node2:~/newcluster/cluster-2.03.09# uname -a Linux node2 2.6.27.4 #2 SMP Fri Oct 31 13:42:09 MSK 2008 i686 GNU/Linux -- Pavel D.Kuzin Nodex LTD. From pk at nodex.ru Fri Oct 31 11:13:05 2008 From: pk at nodex.ru (Pavel Kuzin) Date: Fri, 31 Oct 2008 14:13:05 +0300 Subject: [Linux-cluster] Fw: Building error in Cluster 2.03.09 Message-ID: <0d9d01c93b49$a5a81fc0$a401a8c0@mainoffice.nodex.ru> Hello! I`m trying to build cluster 2.03.09 against linux 2.6.27.4. When building have a error: upgrade.o: In function `upgrade_device_archive': /root/newcluster/cluster-2.03.09/ccs/ccs_tool/upgrade.c:226: undefined reference to `mkostemp' collect2: ld returned 1 exit status make[2]: *** [ccs_tool] Error 1 make[2]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs/ccs_tool' make[1]: *** [all] Error 2 make[1]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs' make: *** [ccs] Error 2 node2:~/newcluster/cluster-2.03.09# uname -a Linux node2 2.6.27.4 #2 SMP Fri Oct 31 13:42:09 MSK 2008 i686 GNU/Linux Distro - Debian Etch Seems mkostemp is available since glibc 2.7. I have 2.6. Can "mkostemp" be changed to another similar function? -- Pavel D.Kuzin Nodex LTD. From mad at wol.de Fri Oct 31 11:19:35 2008 From: mad at wol.de (Marc - A. Dahlhaus [ Administration | Westermann GmbH ]) Date: Fri, 31 Oct 2008 12:19:35 +0100 Subject: [Linux-cluster] Building error in Cluster 2.03.09 In-Reply-To: <0d5f01c93b46$baf5a4e0$a401a8c0@mainoffice.nodex.ru> References: <0d5f01c93b46$baf5a4e0$a401a8c0@mainoffice.nodex.ru> Message-ID: <1225451975.3666.10.camel@marc> Hello Pavel, 2.03.09 builds just fine against kernel 2.6.27.4, openais 0.84 and glibc 2.8 here. As mkostemp should be defined inside of /usr/include/stdlib.h this must be a problem with your local build-environment. Marc Am Freitag, den 31.10.2008, 13:52 +0300 schrieb Pavel Kuzin: > Hello! > > I`m tryig to build cluster 2.03.09 against linux 2.6.27.4. > > When building have a error: > > upgrade.o: In function `upgrade_device_archive': > /root/newcluster/cluster-2.03.09/ccs/ccs_tool/upgrade.c:226: undefined reference to `mkostemp' > collect2: ld returned 1 exit status > make[2]: *** [ccs_tool] Error 1 > make[2]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs/ccs_tool' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/root/newcluster/cluster-2.03.09/ccs' > make: *** [ccs] Error 2 > > node2:~/newcluster/cluster-2.03.09# uname -a > Linux node2 2.6.27.4 #2 SMP Fri Oct 31 13:42:09 MSK 2008 i686 GNU/Linux > > -- > Pavel D.Kuzin > Nodex LTD. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rodrique.heron at baruch.cuny.edu Fri Oct 31 13:27:23 2008 From: rodrique.heron at baruch.cuny.edu (Rodrique Heron) Date: Fri, 31 Oct 2008 09:27:23 -0400 Subject: [Linux-cluster] APC Power switch question Message-ID: <20081031132328.BDDC715EC52@smtp25.baruch.cuny.edu> I just acquired 2 APC Power Switches my self, my servers are dell, so my plan is to use the APC as a primary and drac as a secondary fencingdevice. My cluster as one interface for production traffic and another for cluster traffic which is a private non routed network. Can the fencing devices be connected to the production? Or they have to been on the same network for cluster traffic? Thanks ----- Original Message ----- From: linux-cluster-bounces at redhat.com To: linux clustering Sent: Thu Oct 30 17:38:34 2008 Subject: Re: [Linux-cluster] APC Power switch question Thanks for the answer. I have actually named switches in a 3 node cluster and will set them up accordingly. THis is how cluster.conf looks like, I am finishing the setup for dev03. On Thu, Oct 30, 2008 at 4:32 PM, jim parsons wrote: On Thu, 2008-10-30 at 15:57 -0500, Alan A wrote: > > Hello everyone! > > I have a few short questions. We just acquired 2 APC Power Switches. > Our clustered servers have two power supplies so each APC switch > supplies/supports one server power supply. Example: > dev02 power supply 1 - APC switch 1 > dev02 power supply 2 - APC switch 2 > > Question: > I am trying to complete CONGA setup - and all is clear in the first > box: > Name - got it > IP - got it > Login - got it > Password got it > > What I do not understand is what is: 'port' stand for - is that the > port fence_apc is connecting to APC power switch - or is that the > number of the outlet. It is the outlet number on the switch...or the name of the outlet if you have assigned a name to it using the APC firmware application > What is switch(optional) mean? Certain APC switch models can be ganged together. If you are using the switches standalone (which you are, it seems from the above) just leave this field blank. -j -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Alan A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From raju.rajsand at gmail.com Fri Oct 31 17:18:29 2008 From: raju.rajsand at gmail.com (Rajagopal Swaminathan) Date: Fri, 31 Oct 2008 22:48:29 +0530 Subject: [Linux-cluster] one click to start httpd on all nodes - possible? In-Reply-To: <200809011154.43460.linux@vfemail.net> References: <200808271301.57414.linux@vfemail.net> <38A48FA2F0103444906AD22E14F1B5A307F20245@mailxchg01.corp.opsource.net> <200809011154.43460.linux@vfemail.net> Message-ID: <8786b91c0810311018v75fa7cfdyc0508baad7a29ba6@mail.gmail.com> Greetings On Mon, Sep 1, 2008 at 2:24 PM, Alex wrote: > i need a "command center" to control (start/stop) a > resourse/service globally in 2nd thier, on all N nodes. Have you tried cssh? Not exactly a bells and whistles stuff, but can do what you are describing Regards Rajagopal -------------- next part -------------- An HTML attachment was scrubbed... URL: From pk at nodex.ru Fri Oct 31 17:32:45 2008 From: pk at nodex.ru (Pavel Kuzin) Date: Fri, 31 Oct 2008 20:32:45 +0300 Subject: [Linux-cluster] Strange fenced error References: Message-ID: <1a9e01c93b7e$afbc2170$a401a8c0@mainoffice.nodex.ru> When node trying to fence another Oct 31 20:36:01 node1 fenced[2634]: fencing node "node2" Oct 31 20:36:05 node1 fenced[2634]: can't get node number for node ??^U^I~a^F?^P Oct 31 20:36:05 node1 fenced[2634]: fence "node2" success -- Pavel D.Kuzin Nodex LTD. From pk at nodex.ru Fri Oct 31 17:32:45 2008 From: pk at nodex.ru (Pavel Kuzin) Date: Fri, 31 Oct 2008 20:32:45 +0300 Subject: [Linux-cluster] Strange fenced error References: Message-ID: <1a9e01c93b7e$afbc2170$a401a8c0@mainoffice.nodex.ru> When node trying to fence another Oct 31 20:36:01 node1 fenced[2634]: fencing node "node2" Oct 31 20:36:05 node1 fenced[2634]: can't get node number for node ??^U^I~a^F?^P Oct 31 20:36:05 node1 fenced[2634]: fence "node2" success -- Pavel D.Kuzin Nodex LTD. From lhh at redhat.com Fri Oct 31 18:31:54 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 31 Oct 2008 14:31:54 -0400 Subject: [Linux-cluster] Fw: Building error in Cluster 2.03.09 In-Reply-To: <0d9d01c93b49$a5a81fc0$a401a8c0@mainoffice.nodex.ru> References: <0d9d01c93b49$a5a81fc0$a401a8c0@mainoffice.nodex.ru> Message-ID: <1225477914.3194.159.camel@ayanami> On Fri, 2008-10-31 at 14:13 +0300, Pavel Kuzin wrote: > node2:~/newcluster/cluster-2.03.09# uname -a > Linux node2 2.6.27.4 #2 SMP Fri Oct 31 13:42:09 MSK 2008 i686 GNU/Linux > > Distro - Debian Etch > Maybe... > Seems mkostemp is available since glibc 2.7. > I have 2.6. > Can "mkostemp" be changed to another similar function? #define mkostemp(val, flags) mkstemp(val) ? Man page: int mkstemp(char *template); int mkostemp (char *template, int flags); ... mkostemp() is like mkstemp(), with the difference that flags as for open(2) may be specified in flags (e.g., O_APPEND, O_SYNC). Not sure the implications of doing this; I didn't analyze the open flags used. -- Lon From alan.zg at gmail.com Fri Oct 31 19:39:01 2008 From: alan.zg at gmail.com (Alan A) Date: Fri, 31 Oct 2008 14:39:01 -0500 Subject: [Linux-cluster] Node won't fence APC switch strange error Message-ID: Does anyone have any idea what this means? Any suggestions? >fence_node fenmrdev03 agent "fence_apc" reports: Traceback (most recent call last): File "/sbin/fence_apc", line 829, in ? main() File "/sbin/fence_apc", line 303, in main do_power_off(sock) File "/sbin/fence_apc", line 813, in do_power_off x = do_power_switch(sock, "off") File "/sbi agent "fence_apc" reports: n/fence_apc", line 611, in do_power_switch result_code, response = power_off(txt + ndbuf) File "/sbin/fence_apc", line 817, in power_off x = power_switch(buffer, False, "2", "3"); File "/sbin/fence_apc", line 810, in power_switch raise "un agent "fence_apc" reports: known screen encountered in \n" + str(lines) + "\n" unknown screen encountered in ['', '> 2', '', '', '------- Configure Outlet ------------------------------------------------------', '', ' # State Ph Name Pwr On Dly Pwr Off D agent "fence_apc" reports: ly Reboot Dur.', ' ----------------------------------------------------------------------------', ' 2 ON 1 Fenmrdev03 0 sec 0 sec 5 sec', '', ' 1- Outlet Name : Fenmrdev03', ' 2- Power On Delay(sec) agent "fence_apc" reports: : 0', ' 3- Power Off Delay(sec): 0', ' 4- Reboot Duration(sec): 5', ' 5- Accept Changes : ', '', ' ?- Help, - Back, - Refresh, - Event Log'] -- Alan A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.zg at gmail.com Fri Oct 31 19:42:04 2008 From: alan.zg at gmail.com (Alan A) Date: Fri, 31 Oct 2008 14:42:04 -0500 Subject: [Linux-cluster] Re: Node won't fence APC switch strange error In-Reply-To: References: Message-ID: clvmd hangs when trying to get status or restart it - I am not sure how related is this? On Fri, Oct 31, 2008 at 2:39 PM, Alan A wrote: > Does anyone have any idea what this means? Any suggestions? > > > >fence_node fenmrdev03 > > > > agent "fence_apc" reports: Traceback (most recent call last): > File "/sbin/fence_apc", line 829, in ? > main() > File "/sbin/fence_apc", line 303, in main > do_power_off(sock) > File "/sbin/fence_apc", line 813, in do_power_off > x = do_power_switch(sock, "off") > File "/sbi > agent "fence_apc" reports: n/fence_apc", line 611, in do_power_switch > result_code, response = power_off(txt + ndbuf) > File "/sbin/fence_apc", line 817, in power_off > x = power_switch(buffer, False, "2", "3"); > File "/sbin/fence_apc", line 810, in power_switch > raise "un > agent "fence_apc" reports: known screen encountered in \n" + str(lines) + > "\n" > unknown screen encountered in > ['', '> 2', '', '', '------- Configure Outlet > ------------------------------------------------------', '', ' # State > Ph Name Pwr On Dly Pwr Off D > agent "fence_apc" reports: ly Reboot Dur.', ' > ----------------------------------------------------------------------------', > ' 2 ON 1 Fenmrdev03 0 sec 0 sec 5 sec', > '', ' 1- Outlet Name : Fenmrdev03', ' 2- Power On Delay(sec) > > agent "fence_apc" reports: : 0', ' 3- Power Off Delay(sec): 0', ' > 4- Reboot Duration(sec): 5', ' 5- Accept Changes : ', '', ' ?- > Help, - Back, - Refresh, - Event Log'] > > > -- > Alan A. > -- Alan A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.wendy.cheng at gmail.com Thu Oct 30 18:49:37 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 30 Oct 2008 14:49:37 -0400 Subject: [Linux-cluster] Data Loss / Files and Folders "2-Node_GFS-Cluster" In-Reply-To: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com> References: <2fd157df0810301037jf985e3bne5ca25e91dd74872@mail.gmail.com> Message-ID: <490A01C1.5030003@gmail.com> Jason Ralph wrote: > Hello List, > > We currently have in production a two node cluster with a shared SAS > storage device. Both nodes are running RHEL5 AP and are connected > directly to the storage device via SAS. We also have configured a > high availability NFS service directory that is being exported out and > is mounted on multiple other linux servers. > > The problem that I am seeing is: > FIle and folders that are using the GFS filesystem and live on the > storage device are mysteriously getting lost. My first thought was > that maybe one of our many users has deleted them. So I have revoked > the users privilleges and it is still happening. My other tought was > that a rsync script may have overwrote these files or deleted them. I > have stopped all scripting and crons and it has happened again. > > Can someone help me with a command or a log to view that would show me > where any of these folders may have gone? Or has anyone else ever run > into this type of data loss using the similar setup? > I don't (or "didn't") have adequate involvements with RHEL5 GFS. I may not know enough to response. However, users should be aware of ... Before RHEL 5.1 and community version 2.6.22 kernels, NFS locks (i.e. flock, posix lock, etc) is not populated into filesystem layer. It only reaches Linux VFS layer (local to one particular server). If your file access needs to get synchronized via either flock or posix locks *between multiple hosts (i.e. NFS servers)*, data loss could occur. Newer versions of RHEL and 2.6.22-and-above kernels should have the code to support this new feature. There was an old write-up in section 4.1 of "http://people.redhat.com/wcheng/Project/nfs.htm" about this issue. -- Wendy