From s.wendy.cheng at gmail.com Sun Jun 1 04:12:21 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Sat, 31 May 2008 23:12:21 -0500 Subject: [Linux-cluster] Re: [linux-lvm] Distributed LVM/filesystem/storage In-Reply-To: <20080531070328.GD19431@lug-owl.de> References: <20080529231213.GY19431@lug-owl.de> <20080531070328.GD19431@lug-owl.de> Message-ID: <484221A5.8040605@gmail.com> Jan-Benedict Glaw wrote: > On Fri, 2008-05-30 09:03:35 +0100, Gerrard Geldenhuis wrote: > >> On Behalf Of Jan-Benedict Glaw >> >>> I'm just thinking about using my friend's overly empty harddisks for a >>> common large filesystem by merging them all together into a single, >>> large storage pool accessible by everybody. >>> > [...] > >>> It would be nice to see if anybody of you did the same before (merging >>> the free space from a lot computers into one commonly used large >>> filesystem), if it was successful and what techniques >>> (LVM/NBD/DM/MD/iSCSI/Tahoe/Freenet/Other P2P/...) you used to get there, >>> and how well that worked out in the end. >>> >> Maybe have a look at GFS. >> > > GFS (or GFS2 fwiw) imposes a single, shared storage as its backend. At > least I get that from reading the documentation. This would result in > merging all the single disks via NBD/LVM to one machine first and > export that merged volume back via NBD/iSCSI to the nodes. In case the > actual data is local to a client, it would still be first send to the > central machine (running LVM) and loaded back from there. Not as > distributed as I hoped, or are there other configuration possibilities > to not go that route? > GFS is certainly developed and well tuned in a SAN environment where the shared storage(s) and cluster nodes reside on the very same fibre channel switch network. However, with its symmetric architecture, nothing can prevent it running on top of a group of iscsi disks (with GFS node as initiator), as long as each node can see and access these disks. It doesn't care where the iscsi targets live, nor how many there are. Of course, whether it can perform well in this environment is another story. In short, the notion that GFS requires all disks to be merged into one machine first and then export the merged volume back to the GFS node is *not* correct. I actually have a 4-nodes cluster in my house. Two nodes running Linux iscsi initiators that have a 2-node GFS cluster setup. Another two nodes running a special version of free-BSD as iscsi targets, each directly exports their local disks to the GFS nodes. I have not put too much IO loads on the GFS nodes though (since the cluster is mostly used to study storage block allocation issues - not for real data and/or application). cc linxu-cluster -- Wendy From rcronenwett at gmail.com Sun Jun 1 12:37:46 2008 From: rcronenwett at gmail.com (Ron Cronenwett) Date: Sun, 1 Jun 2008 08:37:46 -0400 Subject: [Linux-cluster] apache resource problem in RHCS 5.1 In-Reply-To: <483ECA36.7070007@xbe.ch> References: <483ECA36.7070007@xbe.ch> Message-ID: <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> Hi Lorenz I had a similar problem while testing with Centos 5.1 on a VMWare workstation setup. One more difference, I have been using system-config-cluster to configure the cluster. Luci seemed to be giving me problems with setting up a mount of an NFS export. But I have not retried Luci since changing the selinux setting I mention below. I found if I did not configure SELinux with setenforce permissive, the /usr/share/cluster/apache.sh script did not execute. Once that runs, it creates /etc/cluster/apache/apache:"name". In that subdirectory, the script creates an httpd.conf file from /etc/httpd/httpd.conf. I also found the new httpd.conf had the Listen statement commented out even though I had set it to my clustered address in /etc/httpd/httpd. I needed to manually uncomment the Listen statement on each node in /etc/cluster/apache/apache:"name"/httpd.conf. Hope this helps. Ron C. On Thu, May 29, 2008 at 11:22 AM, Lorenz Pfiffner wrote: > > Hello everybody > > I have the following test setup: > > - RHEL 5.1 Cluster Suite with rgmanager-2.0.31-1 and cman-2.0.73-1 > - Two VMware machines on an ESX 3.5 U1, so no fence device (it's only a test) > - 4 IP resources defined > - GFS over DRBD, doesn't matter, because it doesn't even work on a local disk > > Now I would like to have an "Apache Resource" which i can select in the luci interface. I assume it's using the /usr/share/cluster/apache.sh script. If I try to start it, the error message looks like > this: > > May 28 16:18:15 testsrv clurgmgrd: [18475]: Starting Service apache:test_httpd > Failed > May 28 16:18:15 testsrv clurgmgrd[18475]: start on apache "test_httpd" returned 1 (generic error) > May 28 16:18:15 testsrv clurgmgrd[18475]: #68: Failed to start service:test_proxy_http; return value: 1 > May 28 16:18:15 testsrv clurgmgrd[18475]: Stopping service service:test_proxy_http > May 28 16:18:16 testsrv clurgmgrd: [18475]: Checking Existence Of File /var/run/cluster/apache/apache:test_httpd.pid [apache:test_httpd] > Failed - File Doesn't Exist > May 28 16:18:16 testsrv clurgmgrd: [18475]: Stopping Service apache:test_httpd > Failed > May 28 16:18:16 testsrv clurgmgrd[18475]: stop on apache "test_httpd" returned 1 (generic error) > May 28 16:18:16 testsrv clurgmgrd[18475]: #71: Relocating failed service service:test_proxy_http > > I've another cluster in which I had to alter the default init.d/httpd script to be able to run multiple apache instances (not vhosts) on one server. But there I have the Apache Service configured with > a "Script Resource". > > Is this supposed to work of is it a feature in development? I don't see something like "Apache Resource" in the current documentation. > > Kind Regards > Lorenz > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From s.wendy.cheng at gmail.com Sun Jun 1 13:50:26 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Sun, 01 Jun 2008 08:50:26 -0500 Subject: [Linux-cluster] Re: [linux-lvm] Distributed LVM/filesystem/storage In-Reply-To: <20080601070726.GK19431@lug-owl.de> References: <20080529231213.GY19431@lug-owl.de> <20080531070328.GD19431@lug-owl.de> <484221A5.8040605@gmail.com> <20080601070726.GK19431@lug-owl.de> Message-ID: <4842A922.6040102@gmail.com> Jan-Benedict Glaw wrote: > On Sat, 2008-05-31 23:12:21 -0500, Wendy Cheng wrote: > >> Jan-Benedict Glaw wrote: >> >>> On Fri, 2008-05-30 09:03:35 +0100, Gerrard Geldenhuis wrote: >>> >>>> On Behalf Of Jan-Benedict Glaw >>>> >>>>> I'm just thinking about using my friend's overly empty harddisks for a >>>>> common large filesystem by merging them all together into a single, >>>>> large storage pool accessible by everybody. >>>>> >>> [...] >>> >>> >>>>> It would be nice to see if anybody of you did the same before (merging >>>>> the free space from a lot computers into one commonly used large >>>>> filesystem), if it was successful and what techniques >>>>> (LVM/NBD/DM/MD/iSCSI/Tahoe/Freenet/Other P2P/...) you used to get there, >>>>> and how well that worked out in the end. >>>>> >>>> Maybe have a look at GFS. >>>> >>> GFS (or GFS2 fwiw) imposes a single, shared storage as its backend. At >>> least I get that from reading the documentation. This would result in >>> merging all the single disks via NBD/LVM to one machine first and >>> export that merged volume back via NBD/iSCSI to the nodes. In case the >>> actual data is local to a client, it would still be first send to the >>> central machine (running LVM) and loaded back from there. Not as >>> distributed as I hoped, or are there other configuration possibilities >>> to not go that route? >>> >> However, with its symmetric architecture, >> nothing can prevent it running on top of a group of iscsi disks (with >> GFS node as initiator), as long as each node can see and access these >> disks. It doesn't care where the iscsi targets live, nor how many there >> are. >> > > So I'd configure each machine's empty disk/partition as an iSCSI > target and let them show up an every "client" machine and run that > setup. How good will GFS deal with temporary (or total) outage of > single targets? Eg. 24h disconnects with ADSL connectivity etc.? > > High availability will not work well in this particular setup - it is more about data and storage sharing between GFS nodes. Note that GFS normally runs on top of CLVM (clustered lvm, in case you don't know about it). You might want to check current (Linux) CLVM raid level support to see whether it fits your needs. -- Wendy From dinesh at patel2202.fsnet.co.uk Sun Jun 1 18:08:41 2008 From: dinesh at patel2202.fsnet.co.uk (Dinesh) Date: Sun, 1 Jun 2008 19:08:41 +0100 Subject: [Linux-cluster] rhel5 Message-ID: No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.24.4/1475 - Release Date: 30/05/2008 14:53 -------------- next part -------------- An HTML attachment was scrubbed... URL: From doobs72 at hotmail.com Sun Jun 1 18:09:51 2008 From: doobs72 at hotmail.com (Dinesh Patel) Date: Sun, 1 Jun 2008 19:09:51 +0100 Subject: [Linux-cluster] rhel5 Message-ID: No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.24.4/1475 - Release Date: 30/05/2008 14:53 -------------- next part -------------- An HTML attachment was scrubbed... URL: From doobs72 at hotmail.com Sun Jun 1 18:12:38 2008 From: doobs72 at hotmail.com (doobs72 _) Date: Sun, 1 Jun 2008 18:12:38 +0000 Subject: [Linux-cluster] rhel5 Message-ID: _________________________________________________________________ Great deals on almost anything at eBay.co.uk. Search, bid, find and win on eBay today! http://clk.atdmt.com/UKM/go/msnnkmgl0010000004ukm/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From doobs72 at hotmail.com Sun Jun 1 19:33:14 2008 From: doobs72 at hotmail.com (doobs72 _) Date: Sun, 1 Jun 2008 19:33:14 +0000 Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding Message-ID: Hi I?m having fencing problems in my 3 node cluster running on RHEL5.0 which involves bonding. I have 3 severs A, B & C in a cluster with bonding configured on eth2 & eth3 for my cluster traffic. The config is as below: DEVICE=eth2 BOOTPROTO=none ONBOOT=yes TYPE=Ethernet MASTER=bond1 SLAVE=yes USRCTL=no DEVICE=eth3 BOOTPROTO=none ONBOOT=yes TYPE=Ethernet MASTER=bond1 SLAVE=yes USRCTL=no DEVICE=bond1 IPADDR=192.168.x.x NETMASK=255.255.255.0 NETWORK=192.168.x.0 BROADCAST=192.168.x.255 ONBOOT=YES BOOTPROTO=none The /etc/modprobe.conf file is configured as below: alias eth0 bnx2 alias eth1 bnx2 alias eth2 e1000 alias eth3 e1000 alias eth4 e1000 alias eth5 e1000 alias scsi_hostadapter cciss alias bond0 bonding options bond0 miimon=100 mode=active-backup max_bonds=3 alias bond1 bonding options bond1 miimon=100 mode=active-backup alias bond2 bonding options bond2 miimon=100 mode=active-backup alias scsi_hostadapter1 qla2xxx alias scsi_hostadapter2 usb-storage The cluster starts up OK, however when I try to test the bonded interfaces my troubles begin. On Node C if I "ifdown bond1", the node C, is fenced and everything works as expected. However if on Node C, I take down the interfaces one at a time i.e. "ifdown eth2", - the cluster stays up as expected using eth3 for routing traffic "ifdown eth3" then node C is fenced by Node A. However in the /var/log/messages file on Node C I see a message saying that Node B will be fenced. The outcome is Nodes C & B are fenced. My question is why does node B get fenced as well? D. _________________________________________________________________ http://clk.atdmt.com/UKM/go/msnnkmgl0010000009ukm/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zac at sprackett.com Mon Jun 2 04:48:12 2008 From: zac at sprackett.com (S. Zachariah Sprackett) Date: Mon, 2 Jun 2008 00:48:12 -0400 Subject: [Linux-cluster] Announcing Perl bindings for libcman Message-ID: Hello, I'd like to announce the availability of my Perl bindings for libcman. You can grab them from here: http://zac.sprackett.com/cman/cluster-cman-0.01.tar.gz A simple example script would be as follows: use Cluster::CMAN; my $cman = new Cluster::CMAN(); $cman->init(); foreach ($cman->get_nodes) { print "Found a node: " . $_->{name} ."\n"; } print "Cluster is" . ($cman->is_quorate() ? "" : " NOT") . " quorate!\n"; $cman->finish(); These bindings also fully support both the notification and recv_data callbacks allowing you to take advantage of them from within perl. Please let me know if you have any trouble with them. -z -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Mon Jun 2 06:10:43 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 2 Jun 2008 08:10:43 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.99.03 (development snapshot) released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its community are proud to announce the 4th release from the master branch: 2.99.03. The 2.99.XX releases are _NOT_ meant to be used for production environments.. yet. You have been warned: *this code will have no mercy* for your servers and your data. The master branch is the main development tree that receives all new features, code, clean up and a whole brand new set of bugs, At some point in time this code will become the 3.0 stable release. Everybody with test equipment and time to spare, is highly encouraged to download, install and test the 2.99 releases and more important report problems. In order to build the 2.99.03 release you will need: - - openais 0.83 or higher - - linux kernel (git snapshot or 2.6.26-rc3) from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git (but can run on 2.6.25 in compatibility mode) NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We are still shipping shared libraries but remember that they can change anytime without warning. A bunch of new shared libraries have been added. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.03.tar.gz In order to use GFS1, the Linux kernel requires a minimal patch: ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.99.02): Bob Peterson (1): bz 446085: Back-port faster bitfit algorithm from gfs2 for better Christine Caulfield (1): [CMAN] Don't busy-loop if we can't get a node name David Teigland (3): gfs_controld: rename files gfs_controld: move recover.c gfs_controld: restructuring Fabio M. Di Nitto (19): [BUILD] Fix sparc #ifdef according to the new gcc tables [MISC] Update copyright [BUILD] Fix build order Merge branch 'master' of ssh://sources.redhat.com/git/cluster [BUILD] Fix dlm_controld linking [BUILD] Fix rg_test linking [BUILD] Fix install permissions [GFS2] Use proper include dir for libvolume_id [FENCE] Fix copyright header for fence_ifmib manpage [FENCE] Fix ifmib README to report the right fence agent [BUILD] Plugin the new shiny fence_ifmib agent [CCS] Use absolute path for queries [CONFIG] Fix lots of bugs in libccsconfdb [BUILD] Add fence_lpar fencing agent to the build system [GFS] remove symlink to umount.gfs2 [GROUP] libgfscontrol: fix build with gcc-4.3 [BUILD] Change build system to cope with new libgfscontrol [BUILD] gfs2 requires group to build [BUILD] Fix mount.gfs2 build Lon Hohberger (3): [rgmanager] Apply patch from Marcelo Azevedo to make migration more robust [rgmanager] Fix live migration option (broken in last commit) [rgmanager] Use /cluster/rm instead of //rm Marek 'marx' Grac (4): [FENCE] Fix #248609: SSH support in Bladecenter fencing (ssh) [FENCE] Fix #446995: Parse error: Unknown option 'switch=3' [FENCE] Fix #447378 - fence_apc unable to connect via ssh to APC 7900 [FENCE]: Fix #237266: New fence agent for HMC/LPAR Ross Vandegrift (1): [FENCE] Add fence_ifmib new agent Ryan McCabe (3): fence: fixes and cleanups to fencing.py library libfence: handle EINTR correctly libfence: update copyright notice Makefile | 4 +- ccs/ccs_tool/update.c | 6 +- ccs/daemon/misc.c | 8 +- cman/daemon/cmanconfig.c | 2 +- cman/qdisk/disk_util.c | 2 +- config/libs/libccsconfdb/libccs.c | 166 +- configure | 14 + fence/agents/apc/fence_apc.py | 90 +- fence/agents/bladecenter/fence_bladecenter.py | 2 +- fence/agents/ifmib/Makefile | 18 + fence/agents/ifmib/README | 45 + fence/agents/ifmib/fence_ifmib.py | 221 ++ fence/agents/lib/fencing.py.py | 88 +- fence/agents/lpar/Makefile | 18 + fence/agents/lpar/fence_lpar.py | 97 + fence/libfence/agent.c | 49 +- fence/libfence/libfence.h | 5 +- fence/man/Makefile | 1 + fence/man/fence_ifmib.8 | 69 + gfs-kernel/src/gfs/bits.c | 85 +- gfs-kernel/src/gfs/bits.h | 3 +- gfs-kernel/src/gfs/rgrp.c | 3 +- gfs/Makefile | 5 +- gfs2/mkfs/Makefile | 1 + gfs2/mount/Makefile | 27 +- gfs2/mount/mount.gfs2.c | 20 +- gfs2/mount/umount.gfs2.c | 168 -- gfs2/mount/util.c | 475 +---- gfs2/mount/util.h | 2 +- group/Makefile | 4 +- group/dlm_controld/Makefile | 2 +- group/gfs_control/Makefile | 41 + group/gfs_control/main.c | 212 ++ group/gfs_controld/Makefile | 11 +- group/gfs_controld/config.c | 180 ++ group/gfs_controld/config.h | 47 + group/gfs_controld/cpg-old.c | 2686 +++++++++++++++++++++++ group/gfs_controld/cpg-old.h | 60 + group/gfs_controld/cpg.c | 289 --- group/gfs_controld/gfs_controld.h | 49 + group/gfs_controld/gfs_daemon.h | 268 +++ group/gfs_controld/group.c | 64 +- group/gfs_controld/lock_dlm.h | 310 --- group/gfs_controld/main.c | 1219 ++++++----- group/gfs_controld/member_cman.c | 29 +- group/gfs_controld/plock.c | 228 +-- group/gfs_controld/recover.c | 2805 ------------------------- group/gfs_controld/util.c | 197 ++ group/libgfscontrol/Makefile | 53 + group/libgfscontrol/libgfscontrol.h | 131 ++ group/libgfscontrol/main.c | 438 ++++ make/defines.mk.input | 2 + rgmanager/include/platform.h | 2 +- rgmanager/include/reslist.h | 2 +- rgmanager/src/clulib/vft.c | 2 +- rgmanager/src/daemons/Makefile | 2 +- rgmanager/src/resources/Makefile | 17 +- rgmanager/src/resources/vm.sh | 20 +- scripts/fenceparse | 2 +- 59 files changed, 6197 insertions(+), 4869 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSEOO8AgUGcMLQ3qJAQLeIQ//ZExWyhAdAdWrlFn5BLoThySCmt2LIvjf TUQAbn8/kXExfdQjB94rfwlCwfml3G7VELZ9g4m9eVhKWATBnKGW+zFyFLPoQnKT XTXre1WDqvQFeoWN/TlmeQ+AhxVCWHrDsvKnWah03ns4dspd85224dHa2MWe0vJe grGhfy88tB+7nbVKC9vJgF5BDUVDJvtAm7BDs0tJYn87JE2riUIEZBJSIyXyrC1x QyjQJrrZxHm2h9g/oDUXTg+BmvAP+RjXaRqQMYFKo/7NoIjR5ZIlecDYHLs5dnbM /dCjgQuFhb3Y+gMmEmb9zA6F7FPbZegFfVMG+bdEt3vwnRIU3RpyKNsZIAp8Z3eK jJQQ3JMmszePFBX3NZoB0BqGuEvUNmt4u82NqLGV3BjphxLzyQMjBSt0BzaLu4fj fkL170J/wDJHfrW7sqkUflrPRRtDXzKXh+n0x9U+hkSA4Oh/haf22/7liRzez9wh xKc4OGnEk+ZeMQ4lR/SXNEr9sOANaJgYrotoNS3NZ2wjEOdMjTYL+JV5k/S9OfHG 3g2XS8CfjuWlvfYxEv9bbWBH4mtBY8HWCEslnXjWUpNs8tpAgfvUwJS+u00JjwDR /RfkaynapgSV3OqzRTOi1iXiEzpsV/n+Dp7zxBgdCc2kECq28tcIDPjzN+ShfaER o7NWXbCZXCY= =jHFQ -----END PGP SIGNATURE----- From maciej.bogucki at artegence.com Mon Jun 2 06:24:12 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Mon, 02 Jun 2008 08:24:12 +0200 Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding In-Reply-To: References: Message-ID: <4843920C.60109@artegence.com> doobs72 _ wrote: > > Hi > > > > I?m having fencing problems in my 3 node cluster running on > RHEL5.0 which involves bonding. > > > > I have 3 severs A, B & C in a cluster with bonding configured on eth2 > & eth3 for my cluster traffic. The config is as below: > > > > DEVICE=eth2 > > BOOTPROTO=none > > ONBOOT=yes > > TYPE=Ethernet > > MASTER=bond1 > > SLAVE=yes > > USRCTL=no > > > > DEVICE=eth3 > > BOOTPROTO=none > > ONBOOT=yes > > TYPE=Ethernet > > MASTER=bond1 > > SLAVE=yes > > USRCTL=no > > > > > > DEVICE=bond1 > > IPADDR=192.168.x.x > > NETMASK=255.255.255.0 > > NETWORK=192.168.x.0 > > BROADCAST=192.168.x.255 > > ONBOOT=YES > > BOOTPROTO=none > > > > The /etc/modprobe.conf file is configured as below: > > > > alias eth0 bnx2 > > alias eth1 bnx2 > > alias eth2 e1000 > > alias eth3 e1000 > > alias eth4 e1000 > > alias eth5 e1000 > > alias scsi_hostadapter cciss > > alias bond0 bonding > > options bond0 miimon=100 mode=active-backup max_bonds=3 > > alias bond1 bonding > > options bond1 miimon=100 mode=active-backup > > alias bond2 bonding > > options bond2 miimon=100 mode=active-backup > > alias scsi_hostadapter1 qla2xxx > > alias scsi_hostadapter2 usb-storage > > > > > > The cluster starts up OK, however when I try to test the bonded > interfaces my troubles begin. > > On Node C if I "ifdown bond1", the node C, is fenced and everything > works as expected. > > > > However if on Node C, I take down the interfaces one at a time i.e. > > "ifdown eth2", - the cluster stays up as expected using eth3 for > routing traffic > > "ifdown eth3" > > then node C is fenced by Node A. However in the /var/log/messages file > on Node C I see a message saying that Node B will be fenced. The > outcome is Nodes C & B are fenced. > > > > My question is why does node B get fenced as well? > > Hello, First of all, You have the problem with bonding. Switch off the cluster, and investigate why when You do "ifdown eth3" the cluster goes down. I suspect that the problem is with e1000 driver. I suppose that C is the master of the cluster and it is faster than election of new master(of A,B). You could identify the master by: i=`cman_tool services | grep -A 1 default | tail -1 | sed -e 's/\[\(.\).*/\1/'`; cman_tool nodes | awk '{print $1,$5}' | grep "^$i" To resolve this issue You need to use more than one communication medium fe. ethernet or disk quorum if You have one? Best Regards Maciej Bogucki From Dinesh.Patel at AAH.co.uk Mon Jun 2 07:35:59 2008 From: Dinesh.Patel at AAH.co.uk (Patel Dino) Date: Mon, 2 Jun 2008 08:35:59 +0100 Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding Message-ID: <8C22506D4103BE40B23DFE9E04B2D8FE05E35561@GBW607SC0054.GB-WS.net> At the time Node A is the master. I do have a quorum disk setup. When the two nodes (B & C) get fenced the cluster stays up with Node A & the quorum disk. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Maciej Bogucki Sent: Monday, June 02, 2008 7:24 AM To: linux clustering Subject: Re: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding doobs72 _ wrote: > > Hi > > > > I'm having fencing problems in my 3 node cluster running on > RHEL5.0 which involves bonding. > > > > I have 3 severs A, B & C in a cluster with bonding configured on eth2 > & eth3 for my cluster traffic. The config is as below: > > > > DEVICE=eth2 > > BOOTPROTO=none > > ONBOOT=yes > > TYPE=Ethernet > > MASTER=bond1 > > SLAVE=yes > > USRCTL=no > > > > DEVICE=eth3 > > BOOTPROTO=none > > ONBOOT=yes > > TYPE=Ethernet > > MASTER=bond1 > > SLAVE=yes > > USRCTL=no > > > > > > DEVICE=bond1 > > IPADDR=192.168.x.x > > NETMASK=255.255.255.0 > > NETWORK=192.168.x.0 > > BROADCAST=192.168.x.255 > > ONBOOT=YES > > BOOTPROTO=none > > > > The /etc/modprobe.conf file is configured as below: > > > > alias eth0 bnx2 > > alias eth1 bnx2 > > alias eth2 e1000 > > alias eth3 e1000 > > alias eth4 e1000 > > alias eth5 e1000 > > alias scsi_hostadapter cciss > > alias bond0 bonding > > options bond0 miimon=100 mode=active-backup max_bonds=3 > > alias bond1 bonding > > options bond1 miimon=100 mode=active-backup > > alias bond2 bonding > > options bond2 miimon=100 mode=active-backup > > alias scsi_hostadapter1 qla2xxx > > alias scsi_hostadapter2 usb-storage > > > > > > The cluster starts up OK, however when I try to test the bonded > interfaces my troubles begin. > > On Node C if I "ifdown bond1", the node C, is fenced and everything > works as expected. > > > > However if on Node C, I take down the interfaces one at a time i.e. > > "ifdown eth2", - the cluster stays up as expected using eth3 for > routing traffic > > "ifdown eth3" > > then node C is fenced by Node A. However in the /var/log/messages file > on Node C I see a message saying that Node B will be fenced. The > outcome is Nodes C & B are fenced. > > > > My question is why does node B get fenced as well? > > Hello, First of all, You have the problem with bonding. Switch off the cluster, and investigate why when You do "ifdown eth3" the cluster goes down. I suspect that the problem is with e1000 driver. I suppose that C is the master of the cluster and it is faster than election of new master(of A,B). You could identify the master by: i=`cman_tool services | grep -A 1 default | tail -1 | sed -e 's/\[\(.\).*/\1/'`; cman_tool nodes | awk '{print $1,$5}' | grep "^$i" To resolve this issue You need to use more than one communication medium fe. ethernet or disk quorum if You have one? Best Regards Maciej Bogucki -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ************************************************************************ DISCLAIMER The information contained in this e-mail is confidential and is intended for the recipient only. If you have received it in error, please notify us immediately by reply e-mail and then delete it from your system. Please do not copy it or use it for any other purposes, or disclose the content of the e-mail to any other person or store or copy the information in any medium. The views contained in this e-mail are those of the author and not necessarily those of AAH Pharmaceuticals Ltd. AAH Pharmaceuticals Ltd is a company incorporated in England and Wales under company number 123458 and whose registered office is at Sapphire Court, Walsgrave Triangle, Coventry, CV2 2TX ************************************************************************ From denisb+gmane at gmail.com Mon Jun 2 11:20:46 2008 From: denisb+gmane at gmail.com (denis) Date: Mon, 02 Jun 2008 13:20:46 +0200 Subject: [Linux-cluster] Re: cman_tool returns Flags: Dirty In-Reply-To: <483FACEF.2080509@redhat.com> References: <483FACEF.2080509@redhat.com> Message-ID: Christine Caulfield wrote: >> denis wrote: >>> What does "Flags: Dirty" mean? Is it anything to worry about? >> http://www.redhat.com/archives/cluster-devel/2007-September/msg00091.html >> NODE_FLAGS_DIRTY - This node has internal state and must not join >> a cluster that also has state. >> What does this actually imply? Anything to care about? How would this >> node "recover" from being dirty? > It's a perfectly normal state. in fact it's expected if you are running > services. It simply means that the cluster has some services running > that have state of their own that cannot be recovered without a full > restart. I would be more worried if you did NOT see this in cman_tool > status. It's NOT a warning. don't worry about it :) Thanks for clarification. I sort of figured this out, but confirmation is appreciated. Regards -- Denis Braekhus From stephan.windmueller at cs.uni-dortmund.de Mon Jun 2 12:47:30 2008 From: stephan.windmueller at cs.uni-dortmund.de (Stephan =?iso-8859-1?Q?Windm=FCller?=) Date: Mon, 2 Jun 2008 14:47:30 +0200 Subject: [Linux-cluster] qdiskd does not start Message-ID: <20080602124730.GA16072@speutel.de> Hello! I created a quorum disk with mkqdisk which is shown when I run "mkqdisk -L" | # mkqdisk -L | mkqdisk v2.0 | | /dev/sdc: | Magic: eb7a62c2 | Label: quorum | Created: Mon Jun 2 11:21:29 2008 | Host: clnode01 My quorum-config in cluster.conf is: | | | | | But when the cluster starts, I can not see that it makes use of the quorum disk: | Nodes: 2 | Expected votes: 3 | Total votes: 2 | Quorum: 2 Neither I can see anything in the daemon-log nor is there a file /tmp/quorum-state. Does anyone know why the qdisk daemon does not start here? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From Alain.Moulle at bull.net Mon Jun 2 12:54:54 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 02 Jun 2008 14:54:54 +0200 Subject: [Linux-cluster] CS5 / what does that means ? Message-ID: <4843ED9E.5080109@bull.net> Hi What can be the causes of this message during a relocate of service ? #60: Mangled reply from member #1 during RG relocate Consequence is that the service remains "starting" and never goes "started". Thanks Regards Alain Moull? From jakub.suchy at enlogit.cz Mon Jun 2 13:33:01 2008 From: jakub.suchy at enlogit.cz (Jakub Suchy) Date: Mon, 2 Jun 2008 15:33:01 +0200 Subject: [Linux-cluster] heartbeat over 2 NICs Message-ID: <20080602133301.GD4368@localhost> Hi, I would like to know, if it's possible to run heartbeat (through cman) over two dedicated network NICs. AFAIK, in old hearbeat code, it was possible using serial + NIC. Unfortunately, I was unable to find this in any documentation and this is the first time a customer is requesting this. (I am not talking about network bonding). Thanks you very much, Jakub Suchy From ccaulfie at redhat.com Mon Jun 2 13:38:37 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Mon, 02 Jun 2008 14:38:37 +0100 Subject: [Linux-cluster] heartbeat over 2 NICs In-Reply-To: <20080602133301.GD4368@localhost> References: <20080602133301.GD4368@localhost> Message-ID: <4843F7DD.8000808@redhat.com> Jakub Suchy wrote: > Hi, > I would like to know, if it's possible to run heartbeat (through cman) > over two dedicated network NICs. AFAIK, in old hearbeat code, it was > possible using serial + NIC. Unfortunately, I was unable to find this in > any documentation and this is the first time a customer is requesting > this. (I am not talking about network bonding). > Basically, no. If you want to use 2 NICs then bonding is what you need. cman can use dual NICs after a fashion but it's not supported and even less well tested. Sorry. -- Chrissie From Dinesh.Patel at AAH.co.uk Mon Jun 2 13:48:52 2008 From: Dinesh.Patel at AAH.co.uk (Patel Dino) Date: Mon, 2 Jun 2008 14:48:52 +0100 Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding Message-ID: <8C22506D4103BE40B23DFE9E04B2D8FE05E35562@GBW607SC0054.GB-WS.net> I think I know what's going on ... When I take down the two slave interfaces (eth2 & eth3) on Node C, the bond1 interface remains UP. This means that the Node C still thinks its OK, however it can not see Node A & B, and tries to fence Node B. Node A which is the master fences Node C. I'm not sure how to resolve this any help would be appreciated. D. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patel Dino Sent: Monday, June 02, 2008 8:36 AM To: linux clustering Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding At the time Node A is the master. I do have a quorum disk setup. When the two nodes (B & C) get fenced the cluster stays up with Node A & the quorum disk. -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Maciej Bogucki Sent: Monday, June 02, 2008 7:24 AM To: linux clustering Subject: Re: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding doobs72 _ wrote: > > Hi > > > > I'm having fencing problems in my 3 node cluster running on > RHEL5.0 which involves bonding. > > > > I have 3 severs A, B & C in a cluster with bonding configured on eth2 > & eth3 for my cluster traffic. The config is as below: > > > > DEVICE=eth2 > > BOOTPROTO=none > > ONBOOT=yes > > TYPE=Ethernet > > MASTER=bond1 > > SLAVE=yes > > USRCTL=no > > > > DEVICE=eth3 > > BOOTPROTO=none > > ONBOOT=yes > > TYPE=Ethernet > > MASTER=bond1 > > SLAVE=yes > > USRCTL=no > > > > > > DEVICE=bond1 > > IPADDR=192.168.x.x > > NETMASK=255.255.255.0 > > NETWORK=192.168.x.0 > > BROADCAST=192.168.x.255 > > ONBOOT=YES > > BOOTPROTO=none > > > > The /etc/modprobe.conf file is configured as below: > > > > alias eth0 bnx2 > > alias eth1 bnx2 > > alias eth2 e1000 > > alias eth3 e1000 > > alias eth4 e1000 > > alias eth5 e1000 > > alias scsi_hostadapter cciss > > alias bond0 bonding > > options bond0 miimon=100 mode=active-backup max_bonds=3 > > alias bond1 bonding > > options bond1 miimon=100 mode=active-backup > > alias bond2 bonding > > options bond2 miimon=100 mode=active-backup > > alias scsi_hostadapter1 qla2xxx > > alias scsi_hostadapter2 usb-storage > > > > > > The cluster starts up OK, however when I try to test the bonded > interfaces my troubles begin. > > On Node C if I "ifdown bond1", the node C, is fenced and everything > works as expected. > > > > However if on Node C, I take down the interfaces one at a time i.e. > > "ifdown eth2", - the cluster stays up as expected using eth3 for > routing traffic > > "ifdown eth3" > > then node C is fenced by Node A. However in the /var/log/messages file > on Node C I see a message saying that Node B will be fenced. The > outcome is Nodes C & B are fenced. > > > > My question is why does node B get fenced as well? > > Hello, First of all, You have the problem with bonding. Switch off the cluster, and investigate why when You do "ifdown eth3" the cluster goes down. I suspect that the problem is with e1000 driver. I suppose that C is the master of the cluster and it is faster than election of new master(of A,B). You could identify the master by: i=`cman_tool services | grep -A 1 default | tail -1 | sed -e 's/\[\(.\).*/\1/'`; cman_tool nodes | awk '{print $1,$5}' | grep "^$i" To resolve this issue You need to use more than one communication medium fe. ethernet or disk quorum if You have one? Best Regards Maciej Bogucki -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ************************************************************************ DISCLAIMER The information contained in this e-mail is confidential and is intended for the recipient only. If you have received it in error, please notify us immediately by reply e-mail and then delete it from your system. Please do not copy it or use it for any other purposes, or disclose the content of the e-mail to any other person or store or copy the information in any medium. The views contained in this e-mail are those of the author and not necessarily those of AAH Pharmaceuticals Ltd. AAH Pharmaceuticals Ltd is a company incorporated in England and Wales under company number 123458 and whose registered office is at Sapphire Court, Walsgrave Triangle, Coventry, CV2 2TX ************************************************************************ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From orkcu at yahoo.com Mon Jun 2 14:02:40 2008 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Mon, 2 Jun 2008 07:02:40 -0700 (PDT) Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding In-Reply-To: <8C22506D4103BE40B23DFE9E04B2D8FE05E35562@GBW607SC0054.GB-WS.net> Message-ID: <320668.48996.qm@web50604.mail.re2.yahoo.com> --- On Mon, 6/2/08, Patel Dino wrote: > From: Patel Dino > Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding > To: "linux clustering" > Received: Monday, June 2, 2008, 9:48 AM > I think I know what's going on ... > > When I take down the two slave interfaces (eth2 & eth3) > on Node C, the > bond1 interface remains UP. > This means that the Node C still thinks its OK, however it > can not see > Node A & B, and tries to fence Node B. > Node A which is the master fences Node C. > > I'm not sure how to resolve this any help would be > appreciated. as previously said, it is a bond problem I had several problems with bonding e1000 interfaces in RHEL4, and it was a problem with the e1000 driver, as soon as I use a new one, bond start to work properly I don?t know if that is the case with rhel5.0, but maybe it is. you can check the archives of this list if you want to find which version of e1000 driver fix my problem. cu roger __________________________________________________________________ Get a sneak peak at messages with a handy reading pane with All new Yahoo! Mail: http://ca.promos.yahoo.com/newmail/overview2/ From ricks at nerd.com Mon Jun 2 16:31:59 2008 From: ricks at nerd.com (Rick Stevens) Date: Mon, 02 Jun 2008 09:31:59 -0700 Subject: [Linux-cluster] apache resource problem in RHCS 5.1 In-Reply-To: <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> References: <483ECA36.7070007@xbe.ch> <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> Message-ID: <4844207F.2090706@nerd.com> Ron Cronenwett wrote: > Hi Lorenz > > I had a similar problem while testing with Centos 5.1 on a VMWare > workstation setup. One more difference, I have been using > system-config-cluster > to configure the cluster. Luci seemed to be giving me problems with > setting up a mount of an NFS export. But I have not retried Luci since > changing > the selinux setting I mention below. > > I found if I did not configure SELinux with setenforce permissive, the > /usr/share/cluster/apache.sh script did not execute. Once that runs, > it creates > /etc/cluster/apache/apache:"name". In that subdirectory, the script > creates an httpd.conf file from /etc/httpd/httpd.conf. I also found > the new httpd.conf > had the Listen statement commented out even though I had set it to my > clustered address in /etc/httpd/httpd. I needed to manually uncomment > the > Listen statement on each node in /etc/cluster/apache/apache:"name"/httpd.conf. Have you checked the SELinux error messages in either /var/log/messages or /var/log/audit/audit.log (or the output of audit2allow -a) to see what SELinux policy is being violated? I'd do that, then bugzilla the apache.sh script and cite your findings. ---------------------------------------------------------------------- - Rick Stevens, Systems Engineer rps2 at nerd.com - - Hosting Consulting, Inc. - - - - The Theory of Rapitivity: E=MC Hammer - - -- Glenn Marcus (via TopFive.com) - ---------------------------------------------------------------------- From Dinesh.Patel at AAH.co.uk Mon Jun 2 18:02:27 2008 From: Dinesh.Patel at AAH.co.uk (Patel Dino) Date: Mon, 2 Jun 2008 19:02:27 +0100 Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding Message-ID: <8C22506D4103BE40B23DFE9E04B2D8FE05E35563@GBW607SC0054.GB-WS.net> I've updated the e1000 drivers from version7.2.7 to version7.6.15.5 and still getting the same problems. Any more suggestions would be appreciated. D -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Roger Pe?a Sent: Monday, June 02, 2008 3:03 PM To: linux clustering Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding --- On Mon, 6/2/08, Patel Dino wrote: > From: Patel Dino > Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding > To: "linux clustering" > Received: Monday, June 2, 2008, 9:48 AM > I think I know what's going on ... > > When I take down the two slave interfaces (eth2 & eth3) > on Node C, the > bond1 interface remains UP. > This means that the Node C still thinks its OK, however it > can not see > Node A & B, and tries to fence Node B. > Node A which is the master fences Node C. > > I'm not sure how to resolve this any help would be > appreciated. as previously said, it is a bond problem I had several problems with bonding e1000 interfaces in RHEL4, and it was a problem with the e1000 driver, as soon as I use a new one, bond start to work properly I don?t know if that is the case with rhel5.0, but maybe it is. you can check the archives of this list if you want to find which version of e1000 driver fix my problem. cu roger __________________________________________________________________ Get a sneak peak at messages with a handy reading pane with All new Yahoo! Mail: http://ca.promos.yahoo.com/newmail/overview2/ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ***************************************************************************** DISCLAIMER The information contained in this e-mail is confidential and is intended for the recipient only. If you have received it in error, please notify us immediately by reply e-mail and then delete it from your system. Please do not copy it or use it for any other purposes, or disclose the content of the e-mail to any other person or store or copy the information in any medium. The views contained in this e-mail are those of the author and not necessarily those of AAH Pharmaceuticals Ltd. AAH Pharmaceuticals Ltd is a company incorporated in England and Wales under company number 123458 and whose registered office is at Sapphire Court, Walsgrave Triangle, Coventry, CV2 2TX ***************************************************************************** From cma at analog.org Mon Jun 2 22:09:55 2008 From: cma at analog.org (Chris Adams) Date: Mon, 2 Jun 2008 17:09:55 -0500 Subject: [Linux-cluster] /sbin/mount.gfs thinks fs is gfs2? Message-ID: <20080602220955.GA83307@analog.org> I am upgrading a system with a GFS 6.0 filesystem from RHEL 3 to CentOS 5, and subsequently GFS 6.0 to 6.1. I've followed the instructions here: http://www.redhat.com/docs/manuals/csgfs/browse/rh-gfs-en/ap-license.html and subsequently ran gfs_tool sb device proto lock_dlm on my gfs lv The cluster is up and quorate, and clvmd sees the gfs lv, but when I try to mount it, I get: # mount -t gfs -o upgrade /dev/mapper/pool_gfs-pool_gfs /VAULT10/ /sbin/mount.gfs: there appears to be a GFS2, not GFS, filesystem on /dev/mapper/pool_gfs-pool_gfs I'm not sure why this is failing. For grins, I tried mounting it as a gfs2 filesystem and this is what I get: # mount -t gfs2 -o upgrade /dev/mapper/pool_gfs-pool_gfs /VAULT10/ /sbin/mount.gfs2: there appears to be a GFS, not GFS2, filesystem on /dev/mapper/pool_gfs-pool_gfs I have successfully performed the upgrade if I use centos 4 as an intermediate step in the upgrade and perform the upgrade steps there and the conversion from lock_gulmd to dlm. However, there are several clusters we need to do this with, so that's a painful option to avoid if possible. Here is the output from gfs_tool: # gfs_tool sb /dev/mapper/pool_gfs-pool_gfs all mh_magic = 0x01161970 mh_type = 1 mh_generation = 0 mh_format = 100 mh_incarn = 0 sb_fs_format = 1308 sb_multihost_format = 1401 sb_flags = 0 sb_bsize = 4096 sb_bsize_shift = 12 sb_seg_size = 16 no_formal_ino = 21 no_addr = 21 no_formal_ino = 22 no_addr = 22 no_formal_ino = 25 no_addr = 25 sb_lockproto = lock_dlm sb_locktable = cma:pool_gfs no_formal_ino = 23 no_addr = 23 no_formal_ino = 24 no_addr = 24 sb_reserved = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 thanks, -chris From Santosh.Panigrahi at in.unisys.com Tue Jun 3 03:57:24 2008 From: Santosh.Panigrahi at in.unisys.com (Panigrahi, Santosh Kumar) Date: Tue, 3 Jun 2008 09:27:24 +0530 Subject: [Linux-cluster] qdiskd does not start In-Reply-To: <20080602124730.GA16072@speutel.de> References: <20080602124730.GA16072@speutel.de> Message-ID: I got an impression from your mail that you have not started qdiskd service. If above is the case then, you have to explicitly start the qdiskd service in all the cluster nodes after starting the cman/rgmanager service. Don't expect cman/rgmanager to start the qdiskd service. Unless one will start the qdiskd service, the cluster won't consider the qdisk configuration. Thanks, Santosh -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Stephan Windm?ller Sent: Monday, June 02, 2008 6:18 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] qdiskd does not start Hello! I created a quorum disk with mkqdisk which is shown when I run "mkqdisk -L" | # mkqdisk -L | mkqdisk v2.0 | | /dev/sdc: | Magic: eb7a62c2 | Label: quorum | Created: Mon Jun 2 11:21:29 2008 | Host: clnode01 My quorum-config in cluster.conf is: | | | | | But when the cluster starts, I can not see that it makes use of the quorum disk: | Nodes: 2 | Expected votes: 3 | Total votes: 2 | Quorum: 2 Neither I can see anything in the daemon-log nor is there a file /tmp/quorum-state. Does anyone know why the qdisk daemon does not start here? From stephan.windmueller at cs.uni-dortmund.de Tue Jun 3 07:11:32 2008 From: stephan.windmueller at cs.uni-dortmund.de (Stephan =?iso-8859-1?Q?Windm=FCller?=) Date: Tue, 3 Jun 2008 09:11:32 +0200 Subject: [Linux-cluster] qdiskd does not start In-Reply-To: References: <20080602124730.GA16072@speutel.de> Message-ID: <20080603071132.GA8765@speutel.de> On Tue, 03. Jun 2008, Panigrahi, Santosh Kumar wrote: > I got an impression from your mail that you have not started qdiskd > service. The service is started from the init script. > If above is the case then, you have to explicitly start the qdiskd > service in all the cluster nodes after starting the cman/rgmanager > service. I tried that, but after running "qdiskd" as root there is no running daemon. syslog says: | qdiskd: Heuristic: 'ping xxx.xxx.xxx.xxx -c1 -t1' score=1 interval=2 tko=1 | qdiskd: Heuristic: 'ping yyy.yyy.yyy.yyy -c1 -t1' score=1 interval=2 tko=1 | qdiskd: Heuristic: 'ping zzz.zzz.zzz.zzz -c1 -t1' score=1 interval=2 tko=1 | qdiskd: 3 heuristics loaded | qdiskd: Quorum Daemon: 3 heuristics, 1 interval, 10 tko, 1 votes With strace I see that qdiskd reads /var/run/qdiskd.pid and tries to access this process (which is not running any more). Even when I delete this pid-file nothing changes. Regards Stephan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From fdinitto at redhat.com Tue Jun 3 09:16:11 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 3 Jun 2008 11:16:11 +0200 (CEST) Subject: [Linux-cluster] Changes in libccs behaviour (PLEASE READ!) Message-ID: Hi guys, I just landed the last bits in libccs to support both xpath lite and full xpath queries. With this new code, a couple of things need to be checked across all applications using libccs. Relevant changes: ccs_connect() used to return only when cluster is quorated. This is not the case anymore. ccs_connect will return as soon as it can connect to aisexec and init properly (or fail). You can use cman_is_quorate from libcman for the same feature. ccs_force_connect() used to take a cluster name in input. The API is still the same, but the cluster name is now ignored (it wasn't in used before either). in order to use xpath lite or full xpath, set fullxpath (int from ccs.h) to either 0 (xpath lite and default) or 1 (full xpath) before invoking ccs_connect or ccs_force_connect. In order to switch from one mode to another, you have to disconnect and connect again. WARNING: use full xpath only if you cannot live without. It is slow and it's a memory eating piece of code. WARNING2: the library is not thread safe (yet?). So far none of our callers really need this feature. Please let me know if i overlooked. Please review your ccs init calls around and take appropriate actions. ccs_test(8): not fully completed yet (another email will follow). Feel free to contact me if you have any questions Fabio PS hint: ccs_force_connect() has a blocking option that will idle loop as long as required and will exit the loop when cman is available for queries. This could replace several hand made loops on ccs_connect i have seen around. -- I'm going to make him an offer he can't refuse. From stephan.windmueller at cs.uni-dortmund.de Tue Jun 3 09:27:19 2008 From: stephan.windmueller at cs.uni-dortmund.de (Stephan =?iso-8859-1?Q?Windm=FCller?=) Date: Tue, 3 Jun 2008 11:27:19 +0200 Subject: [Linux-cluster] qdiskd does not start In-Reply-To: <20080603071132.GA8765@speutel.de> References: <20080602124730.GA16072@speutel.de> <20080603071132.GA8765@speutel.de> Message-ID: <20080603092719.GA15653@speutel.de> On Tue, 03. Jun 2008, Stephan Windm?ller wrote: > With strace I see that qdiskd reads /var/run/qdiskd.pid and tries to > access this process (which is not running any more). Even when I delete > this pid-file nothing changes. After reading parts of the source code I think that I found the problem. In qdisk/main.c the function daemon_init is called: | if (daemon_init(argv[0]) < 0) | goto out; But the type of daemon_init is "void" and it does not return a value: | void | daemon_init(char *prog) | { | | [...] | | daemon(0, 0); | | update_pidfile(prog); | } I do not understand why the linker does not produce an error here. Also it seems unwanted that daemon_init dies with "exit(1)" when an error occurs instead of returning -1. However, qdiskd will always exit when daemonized with this code. I removed the comparison < 0 and got this in syslog: | qdiskd: Initial score 3/3 | qdiskd: Initialization complete | qdiskd: Score sufficient for master operation (3/3; required=1); upgrading | qdiskd: Making bid for master | qdiskd: Assuming master role But after that "cman_tool status" hangs and produces no output. - Stephan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: From denisb+gmane at gmail.com Tue Jun 3 12:26:59 2008 From: denisb+gmane at gmail.com (denis) Date: Tue, 03 Jun 2008 14:26:59 +0200 Subject: [Linux-cluster] Re: CS5 / what does that means ? In-Reply-To: <4843ED9E.5080109@bull.net> References: <4843ED9E.5080109@bull.net> Message-ID: Alain Moulle wrote: > Hi > > What can be the causes of this message during a relocate of service ? > > #60: Mangled reply from member #1 during RG relocate > > Consequence is that the service remains "starting" and never goes "started". I had the same issue at one time, I debugged the initscripts and configuration of the service in question on both nodes and discovered one had a problem in starting the service. As far as I recall fixing the issue with the broken startup also resolved this "Mangled reply" error. I am not saying this is the case on your system, I just thought I would share my experience. Regards -- Denis From miolinux at libero.it Tue Jun 3 13:53:34 2008 From: miolinux at libero.it (Miolinux) Date: Tue, 03 Jun 2008 15:53:34 +0200 Subject: [Linux-cluster] Error with gfs_grow/ gfs_fsck Message-ID: <1212501214.10658.11.camel@GD-P2-093> Hi, I tried to expand my gfs filesystem from 250Gb to 350Gb. I run gfs_grow without any error or warnings. But something gone wrong. Now, i cannot mount the gfs filesystem anymore (lock computer) When i try to do a gfs_fsck i get: [root at west ~]# gfs_fsck -v /dev/mapper/VolGroup_FS100-LogVol_FS100 Initializing fsck Initializing lists... Initializing special inodes... Validating Resource Group index. Level 1 check. 371 resource groups found. (passed) Setting block ranges... This file system is too big for this computer to handle. Last fs block = 0x1049c5c47, but sizeof(unsigned long) is 4 bytes. Unable to determine the boundaries of the file system. Freeing buffers. --- Like when trying to access a >16Tb on 32bit. But the disk below is just 350Gb!! [root at west ~]# lvdisplay /dev/mapper/VolGroup_FS100-LogVol_FS100 --- Logical volume --- LV Name /dev/VolGroup_FS100/LogVol_FS100 VG Name VolGroup_FS100 LV UUID 6kPwvg-AOuA-iUOY-KboE-PyRO-DPNt-5yeD3h LV Write Access read/write LV Status available # open 0 LV Size 349.99 GB Current LE 89597 Segments 3 Allocation inherit Read ahead sectors 0 Block device 253:17 ----- How can i resolve the issue? / How can i recover the data? Infos: CentOS 5.1 [root at west ~]# rpm -qa|grep -i gfs gfs-utils-0.1.12-1.el5 gfs2-utils-0.1.38-1.el5 kmod-gfs-PAE-0.1.19-7.el5_1.1 kmod-gfs-PAE-0.1.16-6.2.6.18_8.1.15.el5 kmod-gfs-PAE-0.1.19-7.el5 --------- [root at west ~]# uname -a Linux west.polito.it 2.6.18-53.1.21.el5PAE #1 SMP Tue May 20 10:03:06 EDT 2008 i686 i686 i386 GNU/Linux ------- P.s: tried also gfs-utils-0.1.17-1 gfs_fsck but with no luck :( From cma at analog.org Tue Jun 3 14:58:45 2008 From: cma at analog.org (Chris Adams) Date: Tue, 3 Jun 2008 09:58:45 -0500 Subject: [Linux-cluster] gfs 6.1 superblock backups Message-ID: <20080603145845.GA88611@analog.org> Does GFS 6.1 have any superblock backups a la ext2/3? If so, how can I find them? thanks, -chris From s.wendy.cheng at gmail.com Tue Jun 3 15:03:55 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 03 Jun 2008 11:03:55 -0400 Subject: [Linux-cluster] gfs 6.1 superblock backups In-Reply-To: <20080603145845.GA88611@analog.org> References: <20080603145845.GA88611@analog.org> Message-ID: <48455D5B.8080909@gmail.com> Chris Adams wrote: > Does GFS 6.1 have any superblock backups a la ext2/3? If so, how can I > find them? > > Unfortunately, no. From cma at analog.org Tue Jun 3 16:27:55 2008 From: cma at analog.org (Chris Adams) Date: Tue, 3 Jun 2008 11:27:55 -0500 Subject: [Linux-cluster] gfs 6.1 superblock backups Message-ID: <20080603162755.GA89011@analog.org> On Tue, 2008-06-03 at 11:03 -0400, Wendy Cheng wrote: Chris Adams wrote: > > Does GFS 6.1 have any superblock backups a la ext2/3? If so, how > > can I find them? > > Unfortunately, no. > If that is the case, then is it safe to assume that fs_sb_format will always be bytes 0x1001a and 0x100b on a gfs logical volume, and that that is the only location on the lv that it is stored? I see #define GFS_FORMAT_FS (1309) /* Filesystem (all-encompassing) */ and that is the location that where I see 0x051d (1309) stored. thanks, -chris From mghofran at caregroup.harvard.edu Tue Jun 3 16:30:04 2008 From: mghofran at caregroup.harvard.edu (mghofran at caregroup.harvard.edu) Date: Tue, 3 Jun 2008 12:30:04 -0400 Subject: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding In-Reply-To: <8C22506D4103BE40B23DFE9E04B2D8FE05E35563@GBW607SC0054.GB-WS.net> References: <8C22506D4103BE40B23DFE9E04B2D8FE05E35563@GBW607SC0054.GB-WS.net> Message-ID: <1BA553C5537DA74194724A82D9595CCB841DF7@EVS8.its.caregroup.org> One observation: In your bond1 file, shouldn't you have a "type=bonding"? -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patel Dino Sent: Monday, June 02, 2008 2:02 PM To: linux clustering Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding I've updated the e1000 drivers from version7.2.7 to version7.6.15.5 and still getting the same problems. Any more suggestions would be appreciated. D -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Roger Pe?a Sent: Monday, June 02, 2008 3:03 PM To: linux clustering Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding --- On Mon, 6/2/08, Patel Dino wrote: > From: Patel Dino > Subject: RE: [Linux-cluster] RHEL5.0 Cluster fencing problems involving bonding > To: "linux clustering" > Received: Monday, June 2, 2008, 9:48 AM > I think I know what's going on ... > > When I take down the two slave interfaces (eth2 & eth3) > on Node C, the > bond1 interface remains UP. > This means that the Node C still thinks its OK, however it > can not see > Node A & B, and tries to fence Node B. > Node A which is the master fences Node C. > > I'm not sure how to resolve this any help would be > appreciated. as previously said, it is a bond problem I had several problems with bonding e1000 interfaces in RHEL4, and it was a problem with the e1000 driver, as soon as I use a new one, bond start to work properly I don?t know if that is the case with rhel5.0, but maybe it is. you can check the archives of this list if you want to find which version of e1000 driver fix my problem. cu roger __________________________________________________________________ Get a sneak peak at messages with a handy reading pane with All new Yahoo! Mail: http://ca.promos.yahoo.com/newmail/overview2/ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ***************************************************************************** DISCLAIMER The information contained in this e-mail is confidential and is intended for the recipient only. If you have received it in error, please notify us immediately by reply e-mail and then delete it from your system. Please do not copy it or use it for any other purposes, or disclose the content of the e-mail to any other person or store or copy the information in any medium. The views contained in this e-mail are those of the author and not necessarily those of AAH Pharmaceuticals Ltd. AAH Pharmaceuticals Ltd is a company incorporated in England and Wales under company number 123458 and whose registered office is at Sapphire Court, Walsgrave Triangle, Coventry, CV2 2TX ***************************************************************************** -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Tue Jun 3 17:23:00 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 03 Jun 2008 12:23:00 -0500 Subject: [Linux-cluster] Error with gfs_grow/ gfs_fsck In-Reply-To: <1212501214.10658.11.camel@GD-P2-093> References: <1212501214.10658.11.camel@GD-P2-093> Message-ID: <1212513780.3428.1.camel@technetium.msp.redhat.com> Hi, On Tue, 2008-06-03 at 15:53 +0200, Miolinux wrote: > Hi, > > I tried to expand my gfs filesystem from 250Gb to 350Gb. > I run gfs_grow without any error or warnings. > But something gone wrong. > > Now, i cannot mount the gfs filesystem anymore (lock computer) > > When i try to do a gfs_fsck i get: > > [root at west ~]# gfs_fsck -v /dev/mapper/VolGroup_FS100-LogVol_FS100 > Initializing fsck > Initializing lists... > Initializing special inodes... > Validating Resource Group index. > Level 1 check. > 371 resource groups found. > (passed) > Setting block ranges... > This file system is too big for this computer to handle. > Last fs block = 0x1049c5c47, but sizeof(unsigned long) is 4 bytes. > Unable to determine the boundaries of the file system. You've probably hit the gfs_grow bug described in bz #434962 (436383) and the gfs_fsck bug described in 440897 (440896). My apologies if you can't read them; permissions to individual bugzilla records are out of my control. The fixes are available in the recently released RHEL5.2, although I don't know when they'll hit Centos. The fixes are also available in the latest cluster git tree if you want to compile/install them from source code yourself. Documentation for doing this can be found at: http://sources.redhat.com/cluster/wiki/ClusterGit Regards, Bob Peterson Red Hat Clustering & GFS From s.wendy.cheng at gmail.com Tue Jun 3 17:43:46 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 03 Jun 2008 13:43:46 -0400 Subject: [Linux-cluster] Error with gfs_grow/ gfs_fsck In-Reply-To: <1212513780.3428.1.camel@technetium.msp.redhat.com> References: <1212501214.10658.11.camel@GD-P2-093> <1212513780.3428.1.camel@technetium.msp.redhat.com> Message-ID: <484582D2.20401@gmail.com> Bob Peterson wrote: > Hi, > > On Tue, 2008-06-03 at 15:53 +0200, Miolinux wrote: > >> Hi, >> >> I tried to expand my gfs filesystem from 250Gb to 350Gb. >> I run gfs_grow without any error or warnings. >> But something gone wrong. >> >> Now, i cannot mount the gfs filesystem anymore (lock computer) >> >> When i try to do a gfs_fsck i get: >> >> [root at west ~]# gfs_fsck -v /dev/mapper/VolGroup_FS100-LogVol_FS100 >> Initializing fsck >> Initializing lists... >> Initializing special inodes... >> Validating Resource Group index. >> Level 1 check. >> 371 resource groups found. >> (passed) >> Setting block ranges... >> This file system is too big for this computer to handle. >> Last fs block = 0x1049c5c47, but sizeof(unsigned long) is 4 bytes. >> Unable to determine the boundaries of the file system. >> > > You've probably hit the gfs_grow bug described in bz #434962 (436383) > and the gfs_fsck bug described in 440897 (440896). My apologies if > you can't read them; permissions to individual bugzilla records are > out of my control. > > The fixes are available in the recently released RHEL5.2, although > I don't know when they'll hit Centos. The fixes are also available > in the latest cluster git tree if you want to compile/install them > from source code yourself. Documentation for doing this can > be found at: http://sources.redhat.com/cluster/wiki/ClusterGit > > This is almost qualified as an FAQ entry :) ... -- Wendy From s.wendy.cheng at gmail.com Tue Jun 3 17:56:57 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 03 Jun 2008 13:56:57 -0400 Subject: [Linux-cluster] gfs 6.1 superblock backups In-Reply-To: <20080603162755.GA89011@analog.org> References: <20080603162755.GA89011@analog.org> Message-ID: <484585E9.3060505@gmail.com> Chris Adams wrote: > On Tue, 2008-06-03 at 11:03 -0400, Wendy Cheng wrote: > Chris Adams wrote: > >>> Does GFS 6.1 have any superblock backups a la ext2/3? If so, how >>> can I find them? >>> >> Unfortunately, no. >> >> > > If that is the case, then is it safe to assume that fs_sb_format will > always be bytes 0x1001a and 0x100b on a gfs logical volume, and that that > is the only location on the lv that it is stored? I see > #define GFS_FORMAT_FS (1309) /* Filesystem (all-encompassing) */ > and that is the location that where I see 0x051d (1309) stored. > > Yes .. in theory (since I don't have the source code in front of me at this moment). Thinking to hand patch it, don't you ? ... There is a header file (I think it is gfs_ondisk.h) that describes the super block layout. -- Wendy From rpeterso at redhat.com Tue Jun 3 17:55:50 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 03 Jun 2008 12:55:50 -0500 Subject: [Linux-cluster] gfs 6.1 superblock backups In-Reply-To: <20080603162755.GA89011@analog.org> References: <20080603162755.GA89011@analog.org> Message-ID: <1212515750.3428.32.camel@technetium.msp.redhat.com> On Tue, 2008-06-03 at 11:27 -0500, Chris Adams wrote: > On Tue, 2008-06-03 at 11:03 -0400, Wendy Cheng wrote: > Chris Adams wrote: > > > Does GFS 6.1 have any superblock backups a la ext2/3? If so, how > > > can I find them? > > > > Unfortunately, no. > > > > If that is the case, then is it safe to assume that fs_sb_format will > always be bytes 0x1001a and 0x100b on a gfs logical volume, and that that > is the only location on the lv that it is stored? I see > #define GFS_FORMAT_FS (1309) /* Filesystem (all-encompassing) */ > and that is the location that where I see 0x051d (1309) stored. > > thanks, > -chris Hi Chris, As Wendy pointed out, there is only one copy of the GFS superblock. You might be better off recreating the file system with gfs_mkfs and restoring from backup. If that option isn't available, read on: The superblock itself is not too horrible to reconstruct, as long as you know the block size (default is 4096). The big question is: did anything AFTER the superblock get destroyed? A lot depends on what was destroyed. Immediately after the superblock is the first resource group (RG) and its bitmaps, and if they got blasted, it might be difficult to reconstruct your file system. The newer versions of gfs_fsck can repair a lot of these problems though, so once you have a proper GFS superblock, you can give that a try. If the RG was destroyed, gfs_fsck is likely to complain about a lot of things. Right after the first set of bitmaps comes some important system files: journal index, resource group index, etc. If those got destroyed, it's even more difficult or even impossible to get your file system back. The quota file follows and then the license file (now reused for fast statfs). After that is the root directory. So you see, it all depends on what all is destroyed and what is still intact. If ONLY the gfs superblock got destroyed, you might be able to use the gfs2_edit tool to patch in the correct values. The superblock ought to look something like this: gfs2_edit - Global File System Editor (use with extreme caution) Block #16 (0x10) of 13092864 (0xC7C800) (superblock) (p.1 of 6) 00010000 01161970 00000001 00000000 00000000 [...p............] 00010010 00000064 00000000 0000051D 00000579 [...d...........y] 00010020 00000000 00001000 0000000C 00000010 [................] 00010030 00000000 00000016 00000000 00000016 [................] 00010040 00000000 00000017 00000000 00000017 [................] 00010050 00000000 0000001A 00000000 0000001A [................] 00010060 6C6F636B 5F646C6D 00000000 00000000 [lock_dlm........] 00010070 00000000 00000000 00000000 00000000 [................] 00010080 00000000 00000000 00000000 00000000 [................] 00010090 00000000 00000000 00000000 00000000 [................] 000100A0 626F6273 5F657878 6F6E3A65 78786F6E [bobs_exxon:exxon] 000100B0 5F6C7600 00000000 00000000 00000000 [_lv.............] 000100C0 00000000 00000000 00000000 00000000 [................] 000100D0 00000000 00000000 00000000 00000000 [................] 000100E0 00000000 00000018 00000000 00000018 [................] 000100F0 00000000 00000019 00000000 00000019 [................] 00010100 00000000 00000000 00000000 00000000 [................] 00010110 00000000 00000000 00000000 00000000 [................] 00010120 00000000 00000000 00000000 00000000 [................] 00010130 00000000 00000000 00000000 00000000 [................] 00010140 00000000 00000000 00000000 00000000 [................] 00010150 00000000 00000000 00000000 00000000 [................] Everything after offset 0x150 should be zeroes on that block. To get a breakdown of the superblock fields, press the "m" key. For my example above, the field breakdown looks like this: Superblock: mh_magic 0x01161970 (hex) mh_type 1 0x1 mh_format 100 0x64 sb_fs_format 1309 0x51d sb_multihost_format 1401 0x579 sb_bsize 4096 0x1000 sb_bsize_shift 12 0xc jindex ino 22 0x16 22 0x16 rindex ino 23 0x17 23 0x17 root dir 26 0x1a 26 0x1a sb_lockproto lock_dlm sb_locktable bobs_exxon:exxon_lv quota ino 24 0x18 24 0x18 license 25 0x19 25 0x19 The 'm' key is a three-way toggle, so you can get back to hex mode by pressing it again once or twice. The gfs2_tool is complex and can be dangerous, so I don't recommend it for file systems that are in production, unless your need is great. Also, never use it when the fs is mounted. The gfs2_edit man page tells how to use it. If this is a RHEL5 system or similar, you'll already have the gfs2_edit tool available to you. If this is RHEL4 you won't have gfs2_edit so your options are: (1) use gfs_edit which is a primitive version of the same tool, (2) I did a port of gfs2_edit for RHEL4. The source tree may be found at: http://people.redhat.com/rpeterso/Experimental/RHEL4.x/ If you go this route, you would have to untar the file, then do: .configure --kernel_src=/usr/src/kernels/(your kernel) make make install This port assumes you have the kernel headers (i.e. kernel-devel) rpms installed. I hope this helps. Regards, Bob Peterson Red Hat Clustering & GFS From bkyoung at gmail.com Tue Jun 3 17:55:57 2008 From: bkyoung at gmail.com (Brandon Young) Date: Tue, 3 Jun 2008 12:55:57 -0500 Subject: [Linux-cluster] Fencing Device Question Message-ID: <824ffea00806031055n6c02701fh8a7fa9587727217e@mail.gmail.com> In my GFS cluster, I use DRAC cards as the fencing device for each node. Yesterday, I had a situation where the DRAC card on a particular node had failed, and would not allow remote logins, etc, but it still returned pings. I don't know how long the card had been dead, and I only noticed because I wished to manually fence the node and fencing failed ... which caused me all sorts of other fun to recover the cluster, afterwards. So, I have uncovered a pretty scary bad-case scenario for my cluster configuration. My question is what (if anything) can RHCS/GFS do to determine the health/presence/operation of fencing devices? If it can do something to monitor the fencing devices, and discovers a bad fencing device, what will it do? For example, if I unplug the network cable for the heartbeat, the node will get fenced immediately. I never tested whether the same would happen if I unplugged a fencing device. I haven't delved into the documentation in a while, but I don't remember anything about a way to have redundant fencing devices, like a DRAC and a network power switch. Is there a way? Thoughts, opinions, insight, documentation, etc would be greatly appreciated. -- Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From cma at analog.org Tue Jun 3 18:27:13 2008 From: cma at analog.org (Chris Adams) Date: Tue, 3 Jun 2008 13:27:13 -0500 Subject: [Linux-cluster] gfs 6.1 superblock backups Message-ID: <20080603182713.GA89586@analog.org> Bob and Wendy, Thank you for your input on this. What I am trying to do is upgrade a GFS 6.0 filesystems which are attached to various RHEL3/CentOS3 systems. After performing the steps which outline the process of going from 3 to 4, but on a CentOS 5 system, I get the problems mentioned in my message yesterday Re: /sbin/mount.gfs thinks fs is gfs2? Everyt time I reinstalled a system with CentOS 5 and tried to get gfs running again I got the same error. Since I know that this is an unsupported operation, I haven't sought support for this. However, I noticed that my upgraded filesystem had sb_fs_format = 1308. The mount code checks for sb_fs_format == GFS_FORMAT_FS for gfs 6.1 and GFS2_FORMAT_FS for gfs2. Since it was neither of these, it kept dying saying that it was a gfs2 fs when mounting it as gfs, and vice versa. Manually modifying sb_fs_format allowed it to mount immediately afterward. A subsequent gfs_fsck completes all passes successfully. Is that sufficient for upgrading the filesystem if the other steps are performed? All fs operations appear to be successful at this point. thanks, -chris From rpeterso at redhat.com Tue Jun 3 18:49:12 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 03 Jun 2008 13:49:12 -0500 Subject: [Linux-cluster] gfs 6.1 superblock backups In-Reply-To: <20080603182713.GA89586@analog.org> References: <20080603182713.GA89586@analog.org> Message-ID: <1212518952.3428.46.camel@technetium.msp.redhat.com> On Tue, 2008-06-03 at 13:27 -0500, Chris Adams wrote: > Bob and Wendy, > Thank you for your input on this. What I am trying to do > is upgrade a GFS 6.0 filesystems which are attached to various > RHEL3/CentOS3 systems. After performing the steps which outline the > process of going from 3 to 4, but on a CentOS 5 system, I get the problems > mentioned in my message yesterday Re: /sbin/mount.gfs thinks fs is gfs2? > Everyt time I reinstalled a system with CentOS 5 and tried to get gfs > running again I got the same error. > > Since I know that this is an unsupported operation, I haven't sought > support for this. However, I noticed that my upgraded filesystem had > sb_fs_format = 1308. The mount code checks for sb_fs_format == > GFS_FORMAT_FS for gfs 6.1 and GFS2_FORMAT_FS for gfs2. Since it was > neither of these, it kept dying saying that it was a gfs2 fs when mounting > it as gfs, and vice versa. Manually modifying sb_fs_format allowed it to > mount immediately afterward. A subsequent gfs_fsck completes all passes > successfully. > > Is that sufficient for upgrading the filesystem if the other steps are > performed? All fs operations appear to be successful at this point. > > thanks, > -chris Hey Chris, I really don't know offhand what changed in the file system between the RHEL3 proprietary version of GFS and the version we have today. (There aren't any differences between RHEL4.x and RHEL5.x GFS format). I can't think of a good reason why my predecessors would have changed the file system format ID unless there was something in the file system that changed and needed reorganizing or reformatting. So like you, that makes me concerned about some loose end. However, I do know gfs_fsck pretty well, and if it says the file system is sane, you should be able to trust it. This is just a guess, but perhaps it had something to do with the difference between the old proprietary GFS (i.e. the old license file) and the GFS Red Hat open-sourced (i.e. empty license file because no license is needed to use it). If I'm correct, it's not likely to cause any problems. There are a few developers from that era around; maybe they'll remember what changed back then and post why it was done. Regards, Bob Peterson Red Hat Clustering & GFS From fdinitto at redhat.com Tue Jun 3 19:19:40 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 3 Jun 2008 21:19:40 +0200 (CEST) Subject: [Linux-cluster] Announcing Perl bindings for libcman In-Reply-To: References: Message-ID: Hi, On Mon, 2 Jun 2008, S. Zachariah Sprackett wrote: > Hello, > > I'd like to announce the availability of my Perl bindings for libcman. > > You can grab them from here: > > http://zac.sprackett.com/cman/cluster-cman-0.01.tar.gz this looks really good. What I would really love to see is a set of perl and python bindings for our shared libraries and part of our official releases. As we discussed on IRC, i'd like them for our master branch in git for libccs, libcman, libdlm and libfence. In master (pre3): libccs from cluster/config/libs/libccsconfdb/ libcman from cluster/cman/lib libdlm from cluster/dlm/libdlm libfence from cluster/fence/libfence (careful there is also a libfenced that we don't need) I believe that all the API's in these libraries are stable by now, but i can't guarantee that 100% yet. Please submit what you like and in your preferred format (patches tho would be best). I noticed that you used GPL2 licence and that's perfect. Make _absolutely_ sure that you take copyright and credits for your work :) Thanks a lot for your contribution Fabio -- I'm going to make him an offer he can't refuse. From s.wendy.cheng at gmail.com Tue Jun 3 23:15:41 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 03 Jun 2008 19:15:41 -0400 Subject: [Linux-cluster] gfs 6.1 superblock backups In-Reply-To: <1212518952.3428.46.camel@technetium.msp.redhat.com> References: <20080603182713.GA89586@analog.org> <1212518952.3428.46.camel@technetium.msp.redhat.com> Message-ID: <4845D09D.6050902@gmail.com> Bob Peterson wrote: > On Tue, 2008-06-03 at 13:27 -0500, Chris Adams wrote: > >> Bob and Wendy, >> Thank you for your input on this. What I am trying to do >> is upgrade a GFS 6.0 filesystems which are attached to various >> RHEL3/CentOS3 systems. After performing the steps which outline the >> process of going from 3 to 4, but on a CentOS 5 system, I get the problems >> mentioned in my message yesterday Re: /sbin/mount.gfs thinks fs is gfs2? >> Everyt time I reinstalled a system with CentOS 5 and tried to get gfs >> running again I got the same error. >> >> Since I know that this is an unsupported operation, I haven't sought >> support for this. However, I noticed that my upgraded filesystem had >> sb_fs_format = 1308. The mount code checks for sb_fs_format == >> GFS_FORMAT_FS for gfs 6.1 and GFS2_FORMAT_FS for gfs2. Since it was >> neither of these, it kept dying saying that it was a gfs2 fs when mounting >> it as gfs, and vice versa. Manually modifying sb_fs_format allowed it to >> mount immediately afterward. A subsequent gfs_fsck completes all passes >> successfully. >> >> Is that sufficient for upgrading the filesystem if the other steps are >> performed? All fs operations appear to be successful at this point. >> >> thanks, >> -chris >> > > I can't think of a good reason why my predecessors would have changed > the file system format ID unless there was something in the file system > that changed and needed reorganizing or reformatting. I'm not the person who added this ID but it is a *right* thing to do. As a rule of thumb, when moving between major releases, such as RHEL3 and RHEL4, a filesystem needs to have an identifier to facilitate the upgrade process. There should be documents, commands and/or tools to guide people how to do the upgrade - all require this type of "ID" implementation. And there should be associated testing efforts allocated to the upgrade command as a safe guard before you can call a filesystem "enterprise product". For GFS specifically, the locking protocols are different between GFS 6.0 and 6.1 (e.g. GULM is in RHEL3 but not in RHEL4) and locking protocol is part of the superblock structure, iirc. From practical point of view, it is probably ok to keep going (but do check RHEL manuals - there should be chapters talking about migration and upgrade between RHEL3 to 4 and RHEL4 to 5). From process point of view, this looks like a RHEL5 bug to me. -- Wendy From miolinux at libero.it Wed Jun 4 08:23:55 2008 From: miolinux at libero.it (Miolinux) Date: Wed, 04 Jun 2008 10:23:55 +0200 Subject: [Linux-cluster] Error with gfs_grow/ gfs_fsck In-Reply-To: <484582D2.20401@gmail.com> References: <1212501214.10658.11.camel@GD-P2-093> <1212513780.3428.1.camel@technetium.msp.redhat.com> <484582D2.20401@gmail.com> Message-ID: <1212567835.7752.3.camel@GD-P2-093> On Tue, 2008-06-03 at 13:43 -0400, Wendy Cheng wrote: > Bob Peterson wrote: > > Hi, > > > > On Tue, 2008-06-03 at 15:53 +0200, Miolinux wrote: > > > >> Hi, > >> > >> I tried to expand my gfs filesystem from 250Gb to 350Gb. > >> I run gfs_grow without any error or warnings. > >> But something gone wrong. > >> > >> Now, i cannot mount the gfs filesystem anymore (lock computer) > >> > >> When i try to do a gfs_fsck i get: > >> > >> [root at west ~]# gfs_fsck -v /dev/mapper/VolGroup_FS100-LogVol_FS100 > >> Initializing fsck > >> Initializing lists... > >> Initializing special inodes... > >> Validating Resource Group index. > >> Level 1 check. > >> 371 resource groups found. > >> (passed) > >> Setting block ranges... > >> This file system is too big for this computer to handle. > >> Last fs block = 0x1049c5c47, but sizeof(unsigned long) is 4 bytes. > >> Unable to determine the boundaries of the file system. > >> > > > > You've probably hit the gfs_grow bug described in bz #434962 (436383) > > and the gfs_fsck bug described in 440897 (440896). My apologies if > > you can't read them; permissions to individual bugzilla records are > > out of my control. > > > > The fixes are available in the recently released RHEL5.2, although > > I don't know when they'll hit Centos. The fixes are also available > > in the latest cluster git tree if you want to compile/install them > > from source code yourself. Documentation for doing this can > > be found at: http://sources.redhat.com/cluster/wiki/ClusterGit > > > > > This is almost qualified as an FAQ entry :) ... > > -- Wendy > > -- Yes, indeed i followed instruction in ?Mikko Partio thread and now it seems working, however i had to install a new computer with a 64bit OS, and compiled a 64bit version of gfs_fsck to fsck the broken disk. Thanks, Miolinux From Alain.Moulle at bull.net Wed Jun 4 09:14:42 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Wed, 04 Jun 2008 11:14:42 +0200 Subject: [Linux-cluster] CS5 / tuning token and consequence on dlm Message-ID: <48465D02.2020104@bull.net> Hi With CS5 : Is there always a link to the value to set for : DLM_LOCK_TIMEOUT if the token default is modified in cluster.conf ???? (with CS4, the modification of deadnode_timer was to be linked to a modification of the DLM_LOCK_TIMEOUT) Thanks Regards Alain Moull? From sunhux at gmail.com Wed Jun 4 10:49:24 2008 From: sunhux at gmail.com (sunhux G) Date: Wed, 4 Jun 2008 18:49:24 +0800 Subject: [Linux-cluster] heartbeat over 2 NICs - Hi Christine Message-ID: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com> Hi Christine, I could have searched Redhat knowledgebase but thought would be easier if I clarify here. We plan to cluster two RHES, server A & server B (on Ver 5.1AP) a)besides the regular network port for the usual network traffic, we only need one additional network port per server to set up the clustering, is this right? b)what if we want to use 2 network ports, then we have to bond the two network ports on server A & the two network ports on server B - is this right? c)anything we need to do on the Cisco switch's ports end? We are using Cisco 6513 Thanks U On 6/2/08, Christine Caulfield wrote: > > Jakub Suchy wrote: > >> Hi, >> I would like to know, if it's possible to run heartbeat (through cman) >> over two dedicated network NICs. AFAIK, in old hearbeat code, it was >> possible using serial + NIC. Unfortunately, I was unable to find this in >> any documentation and this is the first time a customer is requesting >> this. (I am not talking about network bonding). >> >> > Basically, no. > > If you want to use 2 NICs then bonding is what you need. cman can use dual > NICs after a fashion but it's not supported and even less well tested. > > Sorry. > > -- > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.fuerstenau at oce.com Wed Jun 4 12:12:21 2008 From: martin.fuerstenau at oce.com (Martin Fuerstenau) Date: Wed, 4 Jun 2008 14:12:21 +0200 Subject: [Linux-cluster] heartbeat over 2 NICs - Hi Christine In-Reply-To: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com> References: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com> Message-ID: <1212581541.19889.24.camel@lx002140.ops.de> Hi, in my config here I use 2 dual port network cards in each node. I run a 2 node cluster. The nodes are in two racks in the same room. Port 1 of Card 1 and Port 1 of card 2 are bonded to bond0 and are (for fail over and redundancy) connected to 2 Cisco switches. This configuration is save even in the case if on network card will fail. Port 2 of Card 1 and Port 2 of card 2 are bionded to interface bond1. This interface has a private non routed address (192.168....) and is connected to the second with 2 crossed network cables. Therefore I need no switch for the cluster internal traffic. And that means more security because a nonexixting switch can not fail. This configuration works well now for the last two years. Yours Martin F?rstenau Oce Printing Systems On Wed, 2008-06-04 at 18:49 +0800, sunhux G wrote: > Hi Christine, > > > I could have searched Redhat knowledgebase but thought would > be easier if I clarify here. We plan to cluster two RHES, server A > & server B (on Ver 5.1AP) > > a)besides the regular network port for the usual network traffic, > we only need one additional network port per server to set up > the clustering, is this right? > > b)what if we want to use 2 network ports, then we have to bond > the two network ports on server A & the two network ports on > server B - is this right? > > c)anything we need to do on the Cisco switch's ports end? We > are using Cisco 6513 > > > Thanks > U > > On 6/2/08, Christine Caulfield wrote: > Jakub Suchy wrote: > Hi, > I would like to know, if it's possible to run > heartbeat (through cman) > over two dedicated network NICs. AFAIK, in old > hearbeat code, it was > possible using serial + NIC. Unfortunately, I was > unable to find this in > any documentation and this is the first time a > customer is requesting > this. (I am not talking about network bonding). > > > Basically, no. > > If you want to use 2 NICs then bonding is what you need. cman > can use dual NICs after a fashion but it's not supported and > even less well tested. > > Sorry. > > -- > > Chrissie > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Visit Oce at drupa! Register online now: This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law. If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message. Thank you for your co-operation. From johannes.russek at io-consulting.net Wed Jun 4 12:15:42 2008 From: johannes.russek at io-consulting.net (Johannes Russek) Date: Wed, 04 Jun 2008 14:15:42 +0200 Subject: [Linux-cluster] Fencing Device Question In-Reply-To: <824ffea00806031055n6c02701fh8a7fa9587727217e@mail.gmail.com> References: <824ffea00806031055n6c02701fh8a7fa9587727217e@mail.gmail.com> Message-ID: <4846876E.7080406@io-consulting.net> > My question is what (if anything) can RHCS/GFS do to determine the > health/presence/operation of fencing devices? If it can do something > to monitor the fencing devices, and discovers a bad fencing device, > what will it do? For example, if I unplug the network cable for the > heartbeat, the node will get fenced immediately. I never tested > whether the same would happen if I unplugged a fencing device. I > haven't delved into the documentation in a while, but I don't remember > anything about a way to have redundant fencing devices, like a DRAC > and a network power switch. Is there a way? You should be able to add as many fencing devices as you like, cman should go through them top to bottom, if it won't get a positive response from the fencing script. in my case i have IPMI, then network power switch, then fabric fencing. Regards, Johannes > > Thoughts, opinions, insight, documentation, etc would be greatly > appreciated. > > -- > Brandon > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From Alain.Moulle at bull.net Wed Jun 4 12:19:15 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Wed, 04 Jun 2008 14:19:15 +0200 Subject: [Linux-cluster] CS5 / is there a tunable timer between the three start/stop tries ? Message-ID: <48468843.5040300@bull.net> Hi With CS5, when the status of a service returns failed, the CS5 tries to start three times the service , so we can see three start/stop sequences if it does not start correctly each time. The following start is always launchec just after the stop, is there a tunable timer between the three start/stop tries ? Regards Alain Moull? From mgrac at redhat.com Wed Jun 4 12:31:41 2008 From: mgrac at redhat.com (Marek 'marx' Grac) Date: Wed, 04 Jun 2008 14:31:41 +0200 Subject: [Linux-cluster] apache resource problem in RHCS 5.1 In-Reply-To: <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> References: <483ECA36.7070007@xbe.ch> <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> Message-ID: <48468B2D.7060509@redhat.com> Hi, Ron Cronenwett wrote: > I found if I did not configure SELinux with setenforce permissive, the > /usr/share/cluster/apache.sh script did not execute. Once that runs, > it creates > /etc/cluster/apache/apache:"name". In that subdirectory, the script > creates an httpd.conf file from /etc/httpd/httpd.conf. I also found > the new httpd.conf > had the Listen statement commented out even though I had set it to my > clustered address in /etc/httpd/httpd. I needed to manually uncomment > the > Listen statement on each node in /etc/cluster/apache/apache:"name"/httpd.conf. > IP addresses for Apache (same for MySQL, PgSQL, tomcat, ...) are taken from the configuration. This is the reason why original values are commented and replaced with those from cluster.conf (ip address should be a child to service and sibling to apache - as you can use this IP address for different resource agents) m, -- Marek Grac Red Hat Czech s.r.o. From Alain.Moulle at bull.net Wed Jun 4 12:47:21 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Wed, 04 Jun 2008 14:47:21 +0200 Subject: [Linux-cluster] CS5 / about loop "Node is undead" Message-ID: <48468ED9.3050401@bull.net> Hi About my problem of node entering a loop : Jun 3 15:54:49 s_sys at xn2 qdiskd[22256]: Writing eviction notice for node 1 Jun 3 15:54:50 s_sys at xn2 qdiskd[22256]: Node 1 evicted Jun 3 15:54:51 s_sys at xn2 qdiskd[22256]: Node 1 is undead. I notice that just before entering this loop, I have a message : Jun 3 15:54:47 s_sys at xn2 fenced[22327]: fencing node "xn1" Jun 3 15:54:48 s_sys at xn2 qdiskd[22256]: Assuming master role but never the message : Jun 3 15:54:47 s_sys at xn2 fenced[22327]: fence "xn1" success Nethertheless, the service of xn1 is well failovered by xn2, but then after the reboot of xn1, we can't start again the CS5 due to the problem of infernal loop "Node is undead" on xn2. whereas when it works correctly, both messages : fencing node "xn1" fence "xn1" success are successive (after about 30s) So my question is : could this pb of infernal loop "Node is undead" be systematically due to a failed fencing phase of xn2 towards xn1 ? PS: note that I have applied patch : http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9 Thanks Regards Alain Moull? From lp at xbe.ch Wed Jun 4 13:31:45 2008 From: lp at xbe.ch (Lorenz Pfiffner) Date: Wed, 04 Jun 2008 15:31:45 +0200 Subject: [Linux-cluster] apache resource problem in RHCS 5.1 In-Reply-To: <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> References: <483ECA36.7070007@xbe.ch> <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> Message-ID: <48469941.6030800@xbe.ch> Hi Ron Thanks for replying! Your answer gave me some tipps, but none of them worked for me. I don't have SELinux enabled or permissive, it's disabled anyway. I couldn't make it working with the apache resource. For me it seems quite unstable and it's nowhere really mentioned in any documentation I found. So please, if any RedHat guy is reading this, can you please improve this feature and put it into the official documentation. For example, why does the apache.sh script change the "Listen" directive? How can I execute apache.sh manually to debug the resource? My workaround: I altered the default httpd script and made a script resource. In that case it's working as expected. The only thing that bothers me quite a lot is the relocation time. It takes about 50 to 60 seconds to relocate 5 IPs, a GFS mount and the apache script resource! Is this a reasonable time? On older clusters I remember times around 5 to 10 seconds. Kind regards Lorenz Ron Cronenwett wrote: > Hi Lorenz > > I had a similar problem while testing with Centos 5.1 on a VMWare > workstation setup. One more difference, I have been using > system-config-cluster > to configure the cluster. Luci seemed to be giving me problems with > setting up a mount of an NFS export. But I have not retried Luci since > changing > the selinux setting I mention below. > > I found if I did not configure SELinux with setenforce permissive, the > /usr/share/cluster/apache.sh script did not execute. Once that runs, > it creates > /etc/cluster/apache/apache:"name". In that subdirectory, the script > creates an httpd.conf file from /etc/httpd/httpd.conf. I also found > the new httpd.conf > had the Listen statement commented out even though I had set it to my > clustered address in /etc/httpd/httpd. I needed to manually uncomment > the > Listen statement on each node in /etc/cluster/apache/apache:"name"/httpd.conf. > > Hope this helps. > > Ron C. > > > > On Thu, May 29, 2008 at 11:22 AM, Lorenz Pfiffner wrote: >> Hello everybody >> >> I have the following test setup: >> >> - RHEL 5.1 Cluster Suite with rgmanager-2.0.31-1 and cman-2.0.73-1 >> - Two VMware machines on an ESX 3.5 U1, so no fence device (it's only a test) >> - 4 IP resources defined >> - GFS over DRBD, doesn't matter, because it doesn't even work on a local disk >> >> Now I would like to have an "Apache Resource" which i can select in the luci interface. I assume it's using the /usr/share/cluster/apache.sh script. If I try to start it, the error message looks like >> this: >> >> May 28 16:18:15 testsrv clurgmgrd: [18475]: Starting Service apache:test_httpd > Failed >> May 28 16:18:15 testsrv clurgmgrd[18475]: start on apache "test_httpd" returned 1 (generic error) >> May 28 16:18:15 testsrv clurgmgrd[18475]: #68: Failed to start service:test_proxy_http; return value: 1 >> May 28 16:18:15 testsrv clurgmgrd[18475]: Stopping service service:test_proxy_http >> May 28 16:18:16 testsrv clurgmgrd: [18475]: Checking Existence Of File /var/run/cluster/apache/apache:test_httpd.pid [apache:test_httpd] > Failed - File Doesn't Exist >> May 28 16:18:16 testsrv clurgmgrd: [18475]: Stopping Service apache:test_httpd > Failed >> May 28 16:18:16 testsrv clurgmgrd[18475]: stop on apache "test_httpd" returned 1 (generic error) >> May 28 16:18:16 testsrv clurgmgrd[18475]: #71: Relocating failed service service:test_proxy_http >> >> I've another cluster in which I had to alter the default init.d/httpd script to be able to run multiple apache instances (not vhosts) on one server. But there I have the Apache Service configured with >> a "Script Resource". >> >> Is this supposed to work of is it a feature in development? I don't see something like "Apache Resource" in the current documentation. >> >> Kind Regards >> Lorenz >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > From ccaulfie at redhat.com Wed Jun 4 13:38:05 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Wed, 04 Jun 2008 14:38:05 +0100 Subject: [Linux-cluster] heartbeat over 2 NICs - Hi Christine In-Reply-To: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com> References: <60f08e700806040349u344c1bdakcf67ba6ee9492c18@mail.gmail.com> Message-ID: <48469ABD.3030409@redhat.com> sunhux G wrote: > Hi Christine, > > > I could have searched Redhat knowledgebase but thought would > be easier if I clarify here. We plan to cluster two RHES, server A > & server B (on Ver 5.1AP) > > a)besides the regular network port for the usual network traffic, > we only need one additional network port per server to set up > the clustering, is this right? That is highly recommended, yes. You can run with just the one interface (or two bonded) but we always recommend that the cluster traffic is isolated from a main serving network > b)what if we want to use 2 network ports, then we have to bond > the two network ports on server A & the two network ports on > server B - is this right? That right, yes. > c)anything we need to do on the Cisco switch's ports end? We > are using Cisco 6513 > Almost certainly :) I'm no expert on cisco switches but there is some information about running openais over them here: http://openais.org/doku.php?id=faq:cisco_switches -- Chrissie From T.Kumar at alcoa.com Wed Jun 4 13:53:05 2008 From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS)) Date: Wed, 4 Jun 2008 09:53:05 -0400 Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. In-Reply-To: <20080531160007.B6F1061A461@hormel.redhat.com> References: <20080531160007.B6F1061A461@hormel.redhat.com> Message-ID: <0C3FC6B507AF684199E57BFCA3EAB5532582D7DB@NOANDC-MXU11.NOA.Alcoa.com> here is the RHEL version details. # cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.1 (Tikanga) -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of linux-cluster-request at redhat.com Sent: Saturday, May 31, 2008 12:00 PM To: linux-cluster at redhat.com Subject: Linux-cluster Digest, Vol 49, Issue 39 Send Linux-cluster mailing list submissions to linux-cluster at redhat.com To subscribe or unsubscribe via the World Wide Web, visit https://www.redhat.com/mailman/listinfo/linux-cluster or, via email, send a message with subject or body 'help' to linux-cluster-request at redhat.com You can reach the person managing the list at linux-cluster-owner at redhat.com When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-cluster digest..." Today's Topics: 1. Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. (Kumar, T Santhosh (TCS)) 2. Re: Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. (Roger Pe?a) ---------------------------------------------------------------------- Message: 1 Date: Fri, 30 May 2008 13:25:07 -0400 From: "Kumar, T Santhosh \(TCS\)" Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. To: Message-ID: <0C3FC6B507AF684199E57BFCA3EAB5532565630D at NOANDC-MXU11.NOA.Alcoa.com> Content-Type: text/plain; charset="us-ascii" I am planning to upgrade to lvm2-2.02.32-4.el5.x86_64.rpm along with the other three dependencies listed below. lvm2-cluster-2.02.32-4.el5.x86_64.rpm device-mapper-event-1.02.24-1.el5.x86_64.rpm device-mapper-1.02.24-1.el5.x86_64.rpm I prefer to do this as I realise the below. lvm2-2.02.32-4.el5.x86_64.rpm is an updated package which resolves the "clvmd -R did not work as expected". Do any one know of any problems which might come with upgrading the lvm2, device mapper packages. ------------------------------ Message: 2 Date: Fri, 30 May 2008 11:14:41 -0700 (PDT) From: Roger Pe?a Subject: Re: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. To: linux clustering Message-ID: <767810.9219.qm at web50605.mail.re2.yahoo.com> Content-Type: text/plain; charset=us-ascii --- On Fri, 5/30/08, Kumar, T Santhosh (TCS) wrote: > From: Kumar, T Santhosh (TCS) > Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. > To: linux-cluster at redhat.com > Received: Friday, May 30, 2008, 1:25 PM > I am planning to upgrade to lvm2-2.02.32-4.el5.x86_64.rpm > along with > the other three dependencies listed below. > > lvm2-cluster-2.02.32-4.el5.x86_64.rpm > device-mapper-event-1.02.24-1.el5.x86_64.rpm > device-mapper-1.02.24-1.el5.x86_64.rpm > > I prefer to do this as I realise the below. > > lvm2-2.02.32-4.el5.x86_64.rpm is an updated package which > resolves the > "clvmd -R did not work as expected". > > Do any one know of any problems which might come with > upgrading the > lvm2, device mapper packages. I suggest you to take a look in bugzilla. I dont have a linux server in my hand right now to check so I dont know tom what RHEL release you are refering, but we got some clvm problems when we update a RHEL4.5 to RHEL4.6 + update. and also there is bug, fixed for 5.2 but dont know for 4.6, that I think you should into, it was discussed in this list days ago (subject: LVM manager or something) cu roger __________________________________________________________________ Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail. Click on Options in Mail and switch to New Mail today or register for free at http://mail.yahoo.ca ------------------------------ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster End of Linux-cluster Digest, Vol 49, Issue 39 ********************************************* From kri_thi at yahoo.com Wed Jun 4 14:53:09 2008 From: kri_thi at yahoo.com (krishnamurthi G) Date: Wed, 4 Jun 2008 07:53:09 -0700 (PDT) Subject: [Linux-cluster] Any group for VCS cluster Message-ID: <412748.77396.qm@web90407.mail.mud.yahoo.com> Hi, Is there any group to get more info on VCS cluster. Thanks in advance -Krishna -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.kovacs at gmail.com Wed Jun 4 18:18:47 2008 From: corey.kovacs at gmail.com (Corey Kovacs) Date: Wed, 4 Jun 2008 19:18:47 +0100 Subject: [Linux-cluster] gfs_controld Message-ID: <7d6e8da40806041118p73484d53r3c15510dfb536d9c@mail.gmail.com> Previous to a recent upgrade to RHEL5.2 from RHEL5.1, I was using KDE as my default desktop with a home dir mounted from and nfs exported gfs2 filesystem. After the upgrade, kde hangs due to hundreds (even thousands) of the following errors.... gfs_controld[XXXX]: plock result write err 0 errno 2 the exports are nfs ver 3 (i have some older clients) ant proto=udp is this a known issue? is there a fix available? thanks -corey From kri_thi at yahoo.com Thu Jun 5 09:40:51 2008 From: kri_thi at yahoo.com (krishnamurthi G) Date: Thu, 5 Jun 2008 02:40:51 -0700 (PDT) Subject: [Linux-cluster] Any group for VCS cluster Message-ID: <576986.63897.qm@web90407.mail.mud.yahoo.com> Hi , As part of port activity we are planning to port VCS cluster on Windows. We will make use of CLI on UNIX, whereas API are being used on Windows. I am newbie to Windows world. I would appreciate if somebody give me pointers/referrance or any active group ( specify group name). Warm Regards - Krishna ----- Original Message ---- From: krishnamurthi G To: linux clustering Sent: Wednesday, June 4, 2008 8:23:09 PM Subject: [Linux-cluster] Any group for VCS cluster Hi, Is there any group to get more info on VCS cluster. Thanks in advance -Krishna -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgrac at redhat.com Thu Jun 5 15:24:24 2008 From: mgrac at redhat.com (Marek 'marx' Grac) Date: Thu, 05 Jun 2008 17:24:24 +0200 Subject: [Linux-cluster] apache resource problem in RHCS 5.1 In-Reply-To: <48469941.6030800@xbe.ch> References: <483ECA36.7070007@xbe.ch> <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> <48469941.6030800@xbe.ch> Message-ID: <48480528.6000708@redhat.com> Hi, Lorenz Pfiffner wrote: > Hi Ron > > I couldn't make it working with the apache resource. For me it seems > quite unstable and it's nowhere really mentioned in any documentation > I found. So please, if any RedHat guy is reading this, can you please > improve this feature and put it into the official documentation. For > example, why does the apache.sh script change the "Listen" directive? Look at my previous post to this thread. IMHO unstable is something that does not work. > How can I execute apache.sh manually to debug the resource? > If you want to debug, the best way is to run resource group manager in debug mode. So stop it in all machines, and run clurgmgrd -fd (stay forward and debug). Resource agents tries to log as much as is useful and you will see everything on output. If you want to run this script directly, you will have to setup all environment variables OCF_*. > My workaround: I altered the default httpd script and made a script > resource. In that case it's working as expected. The only thing that > bothers me quite a lot is the relocation time. It takes about 50 to 60 > seconds to relocate 5 IPs, a GFS mount and the apache script resource! > Is this a reasonable time? On older clusters I remember times around 5 > to 10 seconds. Default init script for httpd, mysqld, ... will work for you if you have only one httpd on your cluster. It is not suitable for running several instances on same machine. This is one of the reasons why we need resource agents. -- Marek Grac Red Hat Czech s.r.o. From david.costakos at gmail.com Thu Jun 5 20:44:39 2008 From: david.costakos at gmail.com (Dave Costakos) Date: Thu, 5 Jun 2008 13:44:39 -0700 Subject: [Linux-cluster] apache resource problem in RHCS 5.1 In-Reply-To: <48469941.6030800@xbe.ch> References: <483ECA36.7070007@xbe.ch> <9c649280806010537o471d9c2ex159f151a5d9e1433@mail.gmail.com> <48469941.6030800@xbe.ch> Message-ID: <6b6836c60806051344m345b05a5x96e4bdd43fdffe4b@mail.gmail.com> For what it's worth, Lorenz, sometimes it's the simplest things that cause errors. I had this same error. It turned out that the parent directory for the pid file didn't exist. It's complaining about /var/run/cluster/apache/apache:test_httpd.pid. In my case /var/run/cluster existed but /var/run/cluster/apache did not. Can you confirm that /var/run/cluster/apache exists? -Dave. On Wed, Jun 4, 2008 at 6:31 AM, Lorenz Pfiffner wrote: > Hi Ron > > Thanks for replying! Your answer gave me some tipps, but none of them > worked for me. I don't have SELinux enabled or permissive, it's disabled > anyway. > > I couldn't make it working with the apache resource. For me it seems quite > unstable and it's nowhere really mentioned in any documentation I found. So > please, if any RedHat guy is reading this, can you please improve this > feature and put it into the official documentation. For example, why does > the apache.sh script change the "Listen" directive? How can I execute > apache.sh manually to debug the resource? > > My workaround: I altered the default httpd script and made a script > resource. In that case it's working as expected. The only thing that bothers > me quite a lot is the relocation time. It takes about 50 to 60 seconds to > relocate 5 IPs, a GFS mount and the apache script resource! Is this a > reasonable time? On older clusters I remember times around 5 to 10 seconds. > > Kind regards > Lorenz > > > Ron Cronenwett wrote: > >> Hi Lorenz >> >> I had a similar problem while testing with Centos 5.1 on a VMWare >> workstation setup. One more difference, I have been using >> system-config-cluster >> to configure the cluster. Luci seemed to be giving me problems with >> setting up a mount of an NFS export. But I have not retried Luci since >> changing >> the selinux setting I mention below. >> >> I found if I did not configure SELinux with setenforce permissive, the >> /usr/share/cluster/apache.sh script did not execute. Once that runs, >> it creates >> /etc/cluster/apache/apache:"name". In that subdirectory, the script >> creates an httpd.conf file from /etc/httpd/httpd.conf. I also found >> the new httpd.conf >> had the Listen statement commented out even though I had set it to my >> clustered address in /etc/httpd/httpd. I needed to manually uncomment >> the >> Listen statement on each node in >> /etc/cluster/apache/apache:"name"/httpd.conf. >> >> Hope this helps. >> >> Ron C. >> >> >> >> On Thu, May 29, 2008 at 11:22 AM, Lorenz Pfiffner wrote: >> >>> Hello everybody >>> >>> I have the following test setup: >>> >>> - RHEL 5.1 Cluster Suite with rgmanager-2.0.31-1 and cman-2.0.73-1 >>> - Two VMware machines on an ESX 3.5 U1, so no fence device (it's only a >>> test) >>> - 4 IP resources defined >>> - GFS over DRBD, doesn't matter, because it doesn't even work on a local >>> disk >>> >>> Now I would like to have an "Apache Resource" which i can select in the >>> luci interface. I assume it's using the /usr/share/cluster/apache.sh script. >>> If I try to start it, the error message looks like >>> this: >>> >>> May 28 16:18:15 testsrv clurgmgrd: [18475]: Starting Service >>> apache:test_httpd > Failed >>> May 28 16:18:15 testsrv clurgmgrd[18475]: start on apache >>> "test_httpd" returned 1 (generic error) >>> May 28 16:18:15 testsrv clurgmgrd[18475]: #68: Failed to start >>> service:test_proxy_http; return value: 1 >>> May 28 16:18:15 testsrv clurgmgrd[18475]: Stopping service >>> service:test_proxy_http >>> May 28 16:18:16 testsrv clurgmgrd: [18475]: Checking Existence Of >>> File /var/run/cluster/apache/apache:test_httpd.pid [apache:test_httpd] > >>> Failed - File Doesn't Exist >>> May 28 16:18:16 testsrv clurgmgrd: [18475]: Stopping Service >>> apache:test_httpd > Failed >>> May 28 16:18:16 testsrv clurgmgrd[18475]: stop on apache >>> "test_httpd" returned 1 (generic error) >>> May 28 16:18:16 testsrv clurgmgrd[18475]: #71: Relocating >>> failed service service:test_proxy_http >>> >>> I've another cluster in which I had to alter the default init.d/httpd >>> script to be able to run multiple apache instances (not vhosts) on one >>> server. But there I have the Apache Service configured with >>> a "Script Resource". >>> >>> Is this supposed to work of is it a feature in development? I don't see >>> something like "Apache Resource" in the current documentation. >>> >>> Kind Regards >>> Lorenz >>> >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> >> > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Dave Costakos mailto:david.costakos at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rfpike at fedex.com Fri Jun 6 19:45:37 2008 From: rfpike at fedex.com (Robbie Pike) Date: Fri, 6 Jun 2008 14:45:37 -0500 Subject: [Linux-cluster] cluster.conf settings Message-ID: I'm working on procedures for installing Cluster Suite and setting up cluster. I always try to do everything command-line first before using anything like conga or modifying the configuration file directly. Is there a way to add fence_daemon post_join_delay post_fail_delay settings to the cluster.conf using ccs_tool? What things can only be added to the cluster.conf by editing the file? Any help is appreciated. R. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Mon Jun 9 07:42:30 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 9 Jun 2008 09:42:30 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.99.04 (development snapshot) released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its community are proud to announce the 5th release from the master branch: 2.99.04. The 2.99.XX releases are _NOT_ meant to be used for production environments.. yet. You have been warned: *this code will have no mercy* for your servers and your data. The master branch is the main development tree that receives all new features, code, clean up and a whole brand new set of bugs, At some point in time this code will become the 3.0 stable release. Everybody with test equipment and time to spare, is highly encouraged to download, install and test the 2.99 releases and more important report problems. In order to build the 2.99.04 release you will need: - - openais 0.83 or higher - - linux kernel (git snapshot or 2.6.26-rc3) from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git (but can run on 2.6.25 in compatibility mode) NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We are still shipping shared libraries but remember that they can change anytime without warning. A bunch of new shared libraries have been added. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.04.tar.gz In order to use GFS1, the Linux kernel requires a minimal patch: ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.99.03): Bob Peterson (3): Fix gfs2_edit bugs with non-4K block sizes Make gfs2_edit more friendly to automated testing. Updates to gfs2_edit man page for new option. Fabio M. Di Nitto (12): [MISC] Make several API's private again [CONFIG] Add full xpath support to libccs [CMAN] Bump library version [BUILD] Switch libdlmcontrol back to shared library [BUILD] Collapse common library makefile bits in libs.mk [MISC] Remove obsolete and empty files [MISC] Add top level licence files [MISC] Cleanup licence, copyright and header duplication [MISC] Tree cleanup [BUILD] Prepare infrastructure for perl/python bindings [GNBD/FENCE] Move fence_gnbd agent where it belongs [MISC] Update top level copyright file Marek 'marx' Grac (3): [FENCE] Fix #446995: Unknown option [FENCE] Fix: 447378: fence_apc unable to connect via ssh to APC 7900 Fixes #445662: names of resources with spaces are mishandled Mark Hlawatschek (1): mount.gfs2: skip mtab updates COPYING.applications | 339 +++ COPYING.libraries | 510 ++++ COPYRIGHT | 230 ++ Makefile | 15 +- README.licence | 40 + bindings/Makefile | 4 + bindings/perl/Makefile | 4 + bindings/python/Makefile | 4 + ccs/Makefile | 12 - ccs/ccs_tool/Makefile | 12 - ccs/ccs_tool/editconf.c | 12 - ccs/ccs_tool/editconf.h | 12 - ccs/ccs_tool/old_parser.c | 12 - ccs/ccs_tool/update.c | 11 - ccs/ccs_tool/update.h | 12 - ccs/ccs_tool/upgrade.c | 11 - ccs/ccs_tool/upgrade.h | 12 - ccs/ccsais/Makefile | 12 - ccs/ccsais/config.c | 11 - ccs/daemon/Makefile | 12 - ccs/daemon/ccsd.c | 11 - ccs/daemon/cluster_mgr.c | 11 - ccs/daemon/cluster_mgr.h | 11 - ccs/daemon/cnx_mgr.c | 11 - ccs/daemon/cnx_mgr.h | 12 - ccs/daemon/globals.c | 11 - ccs/daemon/globals.h | 11 - ccs/daemon/misc.c | 11 - ccs/daemon/misc.h | 11 - ccs/include/comm_headers.h | 12 - ccs/include/debug.h | 12 - ccs/libccscompat/Makefile | 28 +- ccs/libccscompat/libccscompat.c | 11 - ccs/libccscompat/libccscompat.h | 11 - ccs/man/Makefile | 13 - ccs/man/ccs.7 | 6 - ccs/man/ccs_tool.8 | 7 - ccs/man/ccsd.8 | 7 - ccs/man/cluster.conf.5 | 4 - cman/Makefile | 13 - cman/cman_tool/Makefile | 12 - cman/cman_tool/cman_tool.h | 13 - cman/cman_tool/join.c | 13 - cman/cman_tool/main.c | 13 - cman/daemon/Makefile | 12 - cman/daemon/ais.c | 12 - cman/daemon/ais.h | 11 - cman/daemon/barrier.c | 13 - cman/daemon/barrier.h | 12 - cman/daemon/cman-preconfig.c | 11 - cman/daemon/cman.h | 12 - cman/daemon/cmanconfig.c | 11 - cman/daemon/cmanconfig.h | 12 - cman/daemon/cnxman-private.h | 13 - cman/daemon/cnxman-socket.h | 13 - cman/daemon/commands.c | 13 - cman/daemon/commands.h | 12 - cman/daemon/daemon.c | 11 - cman/daemon/daemon.h | 12 - cman/daemon/list.h | 15 - cman/daemon/logging.c | 12 - cman/daemon/logging.h | 11 - cman/daemon/nodelist.h | 13 - cman/init.d/Makefile | 12 - cman/lib/Makefile | 43 +- cman/lib/libcman.c | 22 - cman/lib/libcman.h | 24 +- cman/man/Makefile | 13 - cman/man/cman.5 | 3 - cman/qdisk/Makefile | 12 - cman/qdisk/bitmap.c | 19 - cman/qdisk/crc32.c | 20 - cman/qdisk/daemon_init.c | 19 - cman/qdisk/disk.c | 19 - cman/qdisk/disk.h | 20 - cman/qdisk/disk_util.c | 20 - cman/qdisk/main.c | 20 - cman/qdisk/mkqdisk.c | 20 - cman/qdisk/platform.h | 19 - cman/qdisk/proc.c | 20 - cman/qdisk/scandisk.c | 19 - cman/qdisk/scandisk.h | 18 - cman/qdisk/score.c | 20 - cman/qdisk/score.h | 20 - cman/tests/Makefile | 12 - cman/tests/qwait.c | 9 - cman/tests/user_service.c | 13 - cmirror-kernel/src/dm-clog-tfr.c | 83 - cmirror-kernel/src/dm-clog-tfr.h | 40 - cmirror-kernel/src/dm-clog.c | 624 ----- cmirror/Makefile | 14 - config/Makefile | 13 - config/libs/Makefile | 13 - config/libs/libccsconfdb/Makefile | 44 +- config/libs/libccsconfdb/ccs.h | 13 +- config/libs/libccsconfdb/libccs.c | 298 ++- config/tools/Makefile | 13 - config/tools/ccs_test/Makefile | 12 - config/tools/ccs_test/ccs_test.c | 11 - config/tools/man/Makefile | 13 - config/tools/man/ccs_test.8 | 6 - configure | 49 +- csnap-kernel/Makefile | 14 - csnap-kernel/patches/2.6.15/00001.patch | 16 - csnap-kernel/patches/2.6.15/00002.patch | 32 - csnap-kernel/patches/2.6.15/00003.patch | 30 - csnap-kernel/patches/2.6.9/00001.patch | 16 - csnap-kernel/patches/2.6.9/00002.patch | 32 - csnap-kernel/patches/2.6.9/00003.patch | 30 - csnap-kernel/src/Makefile | 69 - csnap-kernel/src/dm-csnap.c | 1147 --------- csnap-kernel/src/dm-csnap.h | 70 - csnap/COPYING | 340 --- csnap/Makefile | 15 - csnap/README | 67 - csnap/doc/cluster.snapshot.design.html | 1467 ----------- csnap/doc/csnap.ps | 2994 ---------------------- csnap/patches/csnap-2.6.7-2.4.26 | 195 -- csnap/patches/csnap-2.6.8.1 | 1321 ---------- csnap/src/Makefile | 44 - csnap/src/agent.c | 359 --- csnap/src/buffer.c | 268 -- csnap/src/buffer.h | 60 - csnap/src/buffertest.c | 15 - csnap/src/create.c | 58 - csnap/src/csnap.c | 2623 ------------------- csnap/src/csnap.h | 44 - csnap/src/list.h | 64 - csnap/src/sock.h | 55 - csnap/src/trace.h | 7 - csnap/tests/Makefile | 49 - csnap/tests/devpoke.c | 55 - csnap/tests/devspam.c | 83 - csnap/tests/testclient.c | 185 -- dlm/Makefile | 12 - dlm/libdlm/Makefile | 21 +- dlm/libdlm/libdlm.c | 24 - dlm/libdlm/libdlm.h | 23 - dlm/libdlmcontrol/Makefile | 42 +- dlm/libdlmcontrol/libdlmcontrol.h | 22 - dlm/libdlmcontrol/main.c | 12 - dlm/man/Makefile | 12 - dlm/man/dlm_tool.8 | 6 - dlm/tests/Makefile | 12 - dlm/tests/usertest/Makefile | 12 - dlm/tests/usertest/alternate-lvb.c | 12 - dlm/tests/usertest/dlmtest2.c | 12 - dlm/tests/usertest/threads.c | 12 - dlm/tool/Makefile | 12 - dlm/tool/main.c | 12 - fence/Makefile | 13 - fence/agents/Makefile | 13 - fence/agents/apc/Makefile | 13 - fence/agents/apc/fence_apc.py | 3 +- fence/agents/apc_snmp/Makefile | 13 - fence/agents/apc_snmp/README | 2 - fence/agents/apc_snmp/fence_apc_snmp.py | 13 - fence/agents/baytech/Makefile | 13 - fence/agents/baytech/fence_baytech.pl | 13 - fence/agents/brocade/Makefile | 13 - fence/agents/brocade/fence_brocade.pl | 13 - fence/agents/bullpap/Makefile | 13 - fence/agents/bullpap/fence_bullpap.pl | 12 - fence/agents/cpint/Makefile | 13 - fence/agents/cpint/fence_cpint.pl | 13 - fence/agents/drac/Makefile | 13 - fence/agents/drac/fence_drac.pl | 12 - fence/agents/drac/fence_drac5.py | 3 +- fence/agents/egenera/Makefile | 13 - fence/agents/egenera/fence_egenera.pl | 13 - fence/agents/gnbd/Makefile | 23 + fence/agents/gnbd/main.c | 327 +++ fence/agents/ibmblade/Makefile | 13 - fence/agents/ibmblade/fence_ibmblade.pl | 13 - fence/agents/ifmib/Makefile | 13 - fence/agents/ilo/Makefile | 13 - fence/agents/ilo/fence_ilo.py | 3 +- fence/agents/ipmilan/Makefile | 13 - fence/agents/ipmilan/expect.c | 19 - fence/agents/ipmilan/expect.h | 16 - fence/agents/ipmilan/ipmilan.c | 17 - fence/agents/lib/Makefile | 14 - fence/agents/lib/fencing.py.py | 14 +- fence/agents/lpar/Makefile | 13 - fence/agents/lpar/fence_lpar.py | 3 +- fence/agents/manual/Makefile | 13 - fence/agents/manual/fence_ack_manual.sh | 12 - fence/agents/mcdata/Makefile | 13 - fence/agents/mcdata/fence_mcdata.pl | 14 - fence/agents/rackswitch/Makefile | 13 - fence/agents/rackswitch/do_rack.c | 12 - fence/agents/rps10/Makefile | 13 - fence/agents/rps10/rps10.c | 18 - fence/agents/rsa/Makefile | 13 - fence/agents/rsa/fence_rsa.py | 13 - fence/agents/rsb/Makefile | 13 - fence/agents/rsb/fence_rsb.py | 13 - fence/agents/sanbox2/Makefile | 13 - fence/agents/sanbox2/fence_sanbox2.pl | 13 - fence/agents/scsi/Makefile | 12 - fence/agents/vixel/Makefile | 13 - fence/agents/vixel/fence_vixel.pl | 13 - fence/agents/vmware/Makefile | 13 - fence/agents/vmware/fence_vmware.pl | 15 - fence/agents/wti/Makefile | 13 - fence/agents/wti/fence_wti.py | 3 +- fence/agents/xcat/Makefile | 13 - fence/agents/xcat/fence_xcat.pl | 9 - fence/agents/xvm/Makefile | 12 - fence/agents/xvm/debug.c | 18 - fence/agents/xvm/debug.h | 18 - fence/agents/xvm/fence_xvm.c | 18 - fence/agents/xvm/fence_xvmd.c | 18 - fence/agents/xvm/ip_lookup.c | 18 - fence/agents/xvm/ip_lookup.h | 18 - fence/agents/xvm/mcast.c | 18 - fence/agents/xvm/mcast.h | 18 - fence/agents/xvm/options-ccs.c | 18 - fence/agents/xvm/options.c | 18 - fence/agents/xvm/options.h | 18 - fence/agents/xvm/simple_auth.c | 18 - fence/agents/xvm/simple_auth.h | 18 - fence/agents/xvm/tcp.c | 19 - fence/agents/xvm/tcp.h | 18 - fence/agents/xvm/virt.c | 18 - fence/agents/xvm/virt.h | 18 - fence/agents/xvm/vm_states.c | 18 - fence/agents/xvm/xvm.h | 18 - fence/agents/zvm/Makefile | 13 - fence/agents/zvm/fence_zvm.pl | 13 - fence/fence_node/Makefile | 20 +- fence/fence_node/fence_node.c | 13 - fence/fence_tool/Makefile | 20 +- fence/fence_tool/fence_tool.c | 13 - fence/fenced/Makefile | 18 +- fence/fenced/config.c | 12 - fence/fenced/cpg.c | 12 - fence/fenced/fd.h | 13 - fence/fenced/fenced.h | 12 - fence/fenced/group.c | 12 - fence/fenced/main.c | 13 - fence/fenced/member_cman.c | 12 - fence/fenced/recover.c | 13 - fence/include/linux_endian.h | 13 - fence/libfence/Makefile | 42 +- fence/libfence/agent.c | 13 - fence/libfence/libfence.h | 21 - fence/libfenced/Makefile | 42 +- fence/libfenced/libfenced.h | 22 - fence/libfenced/main.c | 12 - fence/man/Makefile | 14 +- fence/man/fence.8 | 7 - fence/man/fence_ack_manual.8 | 7 - fence/man/fence_apc.8 | 7 - fence/man/fence_baytech.8 | 7 - fence/man/fence_bladecenter.8 | 7 - fence/man/fence_brocade.8 | 7 - fence/man/fence_bullpap.8 | 7 - fence/man/fence_cpint.8 | 7 - fence/man/fence_drac.8 | 6 - fence/man/fence_egenera.8 | 7 - fence/man/fence_gnbd.8 | 84 + fence/man/fence_ibmblade.8 | 7 - fence/man/fence_ilo.8 | 7 - fence/man/fence_ipmilan.8 | 7 - fence/man/fence_manual.8 | 7 - fence/man/fence_mcdata.8 | 7 - fence/man/fence_node.8 | 7 - fence/man/fence_rackswitch.8 | 7 - fence/man/fence_rib.8 | 7 - fence/man/fence_rsa.8 | 6 - fence/man/fence_rsb.8 | 6 - fence/man/fence_sanbox2.8 | 7 - fence/man/fence_scsi.8 | 6 - fence/man/fence_tool.8 | 7 - fence/man/fence_vixel.8 | 7 - fence/man/fence_wti.8 | 7 - fence/man/fence_xcat.8 | 3 - fence/man/fence_xvm.8 | 7 - fence/man/fence_xvmd.8 | 7 - fence/man/fence_zvm.8 | 7 - fence/man/fenced.8 | 7 - gfs-kernel/src/gfs/Makefile | 13 - gfs-kernel/src/gfs/acl.c | 13 - gfs-kernel/src/gfs/acl.h | 13 - gfs-kernel/src/gfs/bits.c | 13 - gfs-kernel/src/gfs/bits.h | 13 - gfs-kernel/src/gfs/bmap.c | 13 - gfs-kernel/src/gfs/bmap.h | 13 - gfs-kernel/src/gfs/daemon.c | 13 - gfs-kernel/src/gfs/daemon.h | 13 - gfs-kernel/src/gfs/dio.c | 13 - gfs-kernel/src/gfs/dio.h | 13 - gfs-kernel/src/gfs/dir.c | 13 - gfs-kernel/src/gfs/dir.h | 13 - gfs-kernel/src/gfs/eaops.c | 13 - gfs-kernel/src/gfs/eaops.h | 13 - gfs-kernel/src/gfs/eattr.c | 13 - gfs-kernel/src/gfs/eattr.h | 13 - gfs-kernel/src/gfs/file.c | 13 - gfs-kernel/src/gfs/file.h | 13 - gfs-kernel/src/gfs/fixed_div64.h | 34 - gfs-kernel/src/gfs/format.h | 13 - gfs-kernel/src/gfs/gfs.h | 13 - gfs-kernel/src/gfs/gfs_ioctl.h | 13 - gfs-kernel/src/gfs/gfs_ondisk.h | 13 - gfs-kernel/src/gfs/glock.c | 13 - gfs-kernel/src/gfs/glock.h | 13 - gfs-kernel/src/gfs/glops.c | 13 - gfs-kernel/src/gfs/glops.h | 13 - gfs-kernel/src/gfs/incore.h | 13 - gfs-kernel/src/gfs/inode.c | 13 - gfs-kernel/src/gfs/inode.h | 13 - gfs-kernel/src/gfs/ioctl.c | 13 - gfs-kernel/src/gfs/ioctl.h | 13 - gfs-kernel/src/gfs/lm.c | 9 - gfs-kernel/src/gfs/lm.h | 13 - gfs-kernel/src/gfs/log.c | 13 - gfs-kernel/src/gfs/log.h | 13 - gfs-kernel/src/gfs/lops.c | 13 - gfs-kernel/src/gfs/lops.h | 13 - gfs-kernel/src/gfs/lvb.c | 13 - gfs-kernel/src/gfs/lvb.h | 13 - gfs-kernel/src/gfs/main.c | 13 - gfs-kernel/src/gfs/mount.c | 13 - gfs-kernel/src/gfs/mount.h | 13 - gfs-kernel/src/gfs/ondisk.c | 13 - gfs-kernel/src/gfs/ops_address.c | 13 - gfs-kernel/src/gfs/ops_address.h | 13 - gfs-kernel/src/gfs/ops_dentry.c | 13 - gfs-kernel/src/gfs/ops_dentry.h | 13 - gfs-kernel/src/gfs/ops_export.c | 13 - gfs-kernel/src/gfs/ops_export.h | 13 - gfs-kernel/src/gfs/ops_file.c | 13 - gfs-kernel/src/gfs/ops_file.h | 13 - gfs-kernel/src/gfs/ops_fstype.c | 10 - gfs-kernel/src/gfs/ops_fstype.h | 13 - gfs-kernel/src/gfs/ops_inode.c | 13 - gfs-kernel/src/gfs/ops_inode.h | 13 - gfs-kernel/src/gfs/ops_super.c | 13 - gfs-kernel/src/gfs/ops_super.h | 13 - gfs-kernel/src/gfs/ops_vm.c | 13 - gfs-kernel/src/gfs/ops_vm.h | 13 - gfs-kernel/src/gfs/page.c | 13 - gfs-kernel/src/gfs/page.h | 13 - gfs-kernel/src/gfs/proc.c | 13 - gfs-kernel/src/gfs/proc.h | 13 - gfs-kernel/src/gfs/quota.c | 13 - gfs-kernel/src/gfs/quota.h | 13 - gfs-kernel/src/gfs/recovery.c | 13 - gfs-kernel/src/gfs/recovery.h | 13 - gfs-kernel/src/gfs/rgrp.c | 13 - gfs-kernel/src/gfs/rgrp.h | 13 - gfs-kernel/src/gfs/super.c | 13 - gfs-kernel/src/gfs/super.h | 13 - gfs-kernel/src/gfs/sys.c | 13 - gfs-kernel/src/gfs/sys.h | 13 - gfs-kernel/src/gfs/trans.c | 13 - gfs-kernel/src/gfs/trans.h | 13 - gfs-kernel/src/gfs/unlinked.c | 13 - gfs-kernel/src/gfs/unlinked.h | 13 - gfs-kernel/src/gfs/util.c | 13 - gfs-kernel/src/gfs/util.h | 13 - gfs/Makefile | 13 - gfs/gfs_debug/Makefile | 13 - gfs/gfs_debug/basic.c | 13 - gfs/gfs_debug/basic.h | 13 - gfs/gfs_debug/block_device.c | 13 - gfs/gfs_debug/block_device.h | 13 - gfs/gfs_debug/gfs_debug.h | 13 - gfs/gfs_debug/main.c | 13 - gfs/gfs_debug/ondisk.c | 13 - gfs/gfs_debug/readfile.c | 13 - gfs/gfs_debug/readfile.h | 13 - gfs/gfs_debug/util.c | 13 - gfs/gfs_debug/util.h | 13 - gfs/gfs_edit/Makefile | 13 - gfs/gfs_edit/gfshex.c | 13 - gfs/gfs_edit/gfshex.h | 13 - gfs/gfs_edit/hexedit.c | 13 - gfs/gfs_edit/hexedit.h | 13 - gfs/gfs_fsck/Makefile | 12 - gfs/gfs_fsck/bio.c | 13 - gfs/gfs_fsck/bio.h | 13 - gfs/gfs_fsck/bitmap.c | 12 - gfs/gfs_fsck/bitmap.h | 13 - gfs/gfs_fsck/block_list.c | 12 - gfs/gfs_fsck/block_list.h | 12 - gfs/gfs_fsck/eattr.c | 12 - gfs/gfs_fsck/eattr.h | 12 - gfs/gfs_fsck/file.c | 13 - gfs/gfs_fsck/file.h | 13 - gfs/gfs_fsck/fs_bits.c | 13 - gfs/gfs_fsck/fs_bits.h | 13 - gfs/gfs_fsck/fs_bmap.c | 13 - gfs/gfs_fsck/fs_bmap.h | 13 - gfs/gfs_fsck/fs_dir.c | 13 - gfs/gfs_fsck/fs_dir.h | 13 - gfs/gfs_fsck/fs_inode.c | 13 - gfs/gfs_fsck/fs_inode.h | 13 - gfs/gfs_fsck/fs_recovery.c | 14 - gfs/gfs_fsck/fs_recovery.h | 13 - gfs/gfs_fsck/fsck.h | 12 - gfs/gfs_fsck/fsck_incore.h | 15 - gfs/gfs_fsck/hash.c | 13 - gfs/gfs_fsck/hash.h | 13 - gfs/gfs_fsck/initialize.c | 13 - gfs/gfs_fsck/inode.c | 12 - gfs/gfs_fsck/inode.h | 12 - gfs/gfs_fsck/inode_hash.c | 13 - gfs/gfs_fsck/inode_hash.h | 13 - gfs/gfs_fsck/link.c | 13 - gfs/gfs_fsck/link.h | 14 - gfs/gfs_fsck/log.c | 12 - gfs/gfs_fsck/log.h | 12 - gfs/gfs_fsck/lost_n_found.c | 13 - gfs/gfs_fsck/lost_n_found.h | 13 - gfs/gfs_fsck/main.c | 12 - gfs/gfs_fsck/metawalk.c | 12 - gfs/gfs_fsck/metawalk.h | 12 - gfs/gfs_fsck/ondisk.c | 13 - gfs/gfs_fsck/ondisk.h | 13 - gfs/gfs_fsck/pass1.c | 13 - gfs/gfs_fsck/pass1b.c | 13 - gfs/gfs_fsck/pass1c.c | 12 - gfs/gfs_fsck/pass2.c | 13 - gfs/gfs_fsck/pass3.c | 13 - gfs/gfs_fsck/pass4.c | 13 - gfs/gfs_fsck/pass5.c | 13 - gfs/gfs_fsck/rgrp.c | 14 - gfs/gfs_fsck/rgrp.h | 13 - gfs/gfs_fsck/super.c | 13 - gfs/gfs_fsck/super.h | 13 - gfs/gfs_fsck/test_bitmap.c | 12 - gfs/gfs_fsck/test_block_list.c | 12 - gfs/gfs_fsck/util.c | 13 - gfs/gfs_fsck/util.h | 13 - gfs/gfs_grow/Makefile | 13 - gfs/gfs_grow/main.c | 13 - gfs/gfs_grow/ondisk.c | 13 - gfs/gfs_jadd/Makefile | 13 - gfs/gfs_jadd/main.c | 13 - gfs/gfs_jadd/ondisk.c | 13 - gfs/gfs_mkfs/Makefile | 13 - gfs/gfs_mkfs/device_geometry.c | 13 - gfs/gfs_mkfs/fs_geometry.c | 13 - gfs/gfs_mkfs/locking.c | 13 - gfs/gfs_mkfs/main.c | 13 - gfs/gfs_mkfs/mkfs_gfs.h | 13 - gfs/gfs_mkfs/ondisk.c | 13 - gfs/gfs_mkfs/structures.c | 13 - gfs/gfs_quota/Makefile | 13 - gfs/gfs_quota/check.c | 13 - gfs/gfs_quota/gfs_quota.h | 13 - gfs/gfs_quota/layout.c | 13 - gfs/gfs_quota/main.c | 13 - gfs/gfs_quota/names.c | 13 - gfs/gfs_quota/ondisk.c | 13 - gfs/gfs_tool/Makefile | 13 - gfs/gfs_tool/counters.c | 13 - gfs/gfs_tool/decipher_lockstate_dump | 14 - gfs/gfs_tool/df.c | 13 - gfs/gfs_tool/gfs_tool.h | 13 - gfs/gfs_tool/layout.c | 13 - gfs/gfs_tool/main.c | 13 - gfs/gfs_tool/misc.c | 13 - gfs/gfs_tool/ondisk.c | 13 - gfs/gfs_tool/parse_lockdump | 14 - gfs/gfs_tool/sb.c | 13 - gfs/gfs_tool/tune.c | 13 - gfs/gfs_tool/util.c | 13 - gfs/include/global.h | 13 - gfs/include/linux_endian.h | 13 - gfs/include/osi_list.h | 13 - gfs/include/osi_user.h | 13 - gfs/init.d/Makefile | 12 - gfs/libgfs/Makefile | 51 +- gfs/libgfs/bio.c | 13 - gfs/libgfs/bitmap.c | 12 - gfs/libgfs/block_list.c | 12 - gfs/libgfs/file.c | 13 - gfs/libgfs/fs_bits.c | 13 - gfs/libgfs/fs_bmap.c | 13 - gfs/libgfs/fs_dir.c | 13 - gfs/libgfs/fs_inode.c | 13 - gfs/libgfs/incore.h | 13 - gfs/libgfs/inode.c | 12 - gfs/libgfs/log.c | 12 - gfs/libgfs/ondisk.c | 13 - gfs/libgfs/rgrp.c | 14 - gfs/libgfs/size.c | 13 - gfs/libgfs/super.c | 13 - gfs/libgfs/util.c | 13 - gfs/man/Makefile | 13 - gfs/man/gfs.8 | 3 - gfs/man/gfs_edit.8 | 2 - gfs/man/gfs_fsck.8 | 3 - gfs/man/gfs_grow.8 | 3 - gfs/man/gfs_jadd.8 | 3 - gfs/man/gfs_mkfs.8 | 3 - gfs/man/gfs_mount.8 | 8 - gfs/man/gfs_quota.8 | 3 - gfs/man/gfs_tool.8 | 3 - gfs/tests/Makefile | 12 - gfs/tests/filecon2/Makefile | 13 - gfs/tests/filecon2/filecon2.h | 13 - gfs/tests/filecon2/filecon2_client.c | 13 - gfs/tests/filecon2/filecon2_server.c | 13 - gfs/tests/mmdd/Makefile | 13 - gfs/tests/mmdd/mmdd.c | 13 - gfs2/Makefile | 13 - gfs2/convert/Makefile | 12 - gfs2/convert/gfs2_convert.c | 6 - gfs2/debug/Makefile | 59 - gfs2/debug/basic.c | 471 ---- gfs2/debug/basic.h | 39 - gfs2/debug/block_device.c | 130 - gfs2/debug/block_device.h | 27 - gfs2/debug/gfs2_debug.h | 96 - gfs2/debug/main.c | 192 -- gfs2/debug/ondisk.c | 25 - gfs2/debug/readfile.c | 228 -- gfs2/debug/readfile.h | 27 - gfs2/debug/util.c | 347 --- gfs2/debug/util.h | 42 - gfs2/edit/Makefile | 13 - gfs2/edit/gfs2hex.c | 26 +- gfs2/edit/gfs2hex.h | 13 - gfs2/edit/hexedit.c | 154 +- gfs2/edit/hexedit.h | 14 - gfs2/edit/savemeta.c | 70 +- gfs2/fsck/Makefile | 12 - gfs2/fsck/eattr.c | 12 - gfs2/fsck/eattr.h | 12 - gfs2/fsck/fs_bits.h | 13 - gfs2/fsck/fs_recovery.c | 13 - gfs2/fsck/fs_recovery.h | 13 - gfs2/fsck/fsck.h | 12 - gfs2/fsck/hash.c | 13 - gfs2/fsck/hash.h | 13 - gfs2/fsck/initialize.c | 13 - gfs2/fsck/inode_hash.c | 13 - gfs2/fsck/inode_hash.h | 13 - gfs2/fsck/link.c | 13 - gfs2/fsck/link.h | 14 - gfs2/fsck/lost_n_found.c | 13 - gfs2/fsck/lost_n_found.h | 13 - gfs2/fsck/main.c | 12 - gfs2/fsck/metawalk.c | 12 - gfs2/fsck/metawalk.h | 12 - gfs2/fsck/pass1.c | 13 - gfs2/fsck/pass1b.c | 13 - gfs2/fsck/pass1c.c | 12 - gfs2/fsck/pass2.c | 13 - gfs2/fsck/pass3.c | 13 - gfs2/fsck/pass4.c | 13 - gfs2/fsck/pass5.c | 13 - gfs2/fsck/rgrepair.c | 13 - gfs2/fsck/test.c | 1 - gfs2/fsck/test_bitmap.c | 12 - gfs2/fsck/test_block_list.c | 12 - gfs2/fsck/util.c | 13 - gfs2/fsck/util.h | 13 - gfs2/include/gfs2_disk_hash.h | 13 - gfs2/include/global.h | 13 - gfs2/include/linux_endian.h | 13 - gfs2/include/osi_list.h | 13 - gfs2/include/osi_user.h | 13 - gfs2/init.d/Makefile | 12 - gfs2/libgfs2/Makefile | 48 +- gfs2/libgfs2/bitmap.c | 12 - gfs2/libgfs2/block_list.c | 12 - gfs2/libgfs2/buf.c | 13 - gfs2/libgfs2/device_geometry.c | 13 - gfs2/libgfs2/fs_bits.c | 13 - gfs2/libgfs2/fs_geometry.c | 13 - gfs2/libgfs2/fs_ops.c | 13 - gfs2/libgfs2/gfs2_log.c | 12 - gfs2/libgfs2/libgfs2.h | 13 - gfs2/libgfs2/locking.c | 13 - gfs2/libgfs2/misc.c | 13 - gfs2/libgfs2/ondisk.c | 13 - gfs2/libgfs2/ondisk.h | 9 - gfs2/libgfs2/recovery.c | 9 - gfs2/libgfs2/rgrp.c | 13 - gfs2/libgfs2/size.c | 13 - gfs2/libgfs2/structures.c | 13 - gfs2/libgfs2/super.c | 13 - gfs2/man/Makefile | 13 - gfs2/man/gfs2.8 | 3 - gfs2/man/gfs2_convert.8 | 3 - gfs2/man/gfs2_edit.8 | 6 +- gfs2/man/gfs2_fsck.8 | 3 - gfs2/man/gfs2_grow.8 | 3 - gfs2/man/gfs2_jadd.8 | 3 - gfs2/man/gfs2_mount.8 | 8 - gfs2/man/gfs2_quota.8 | 3 - gfs2/man/gfs2_tool.8 | 3 - gfs2/man/mkfs.gfs2.8 | 3 - gfs2/mkfs/Makefile | 5 - gfs2/mkfs/gfs2_mkfs.h | 13 - gfs2/mkfs/main.c | 13 - gfs2/mkfs/main_grow.c | 12 - gfs2/mkfs/main_jadd.c | 11 - gfs2/mkfs/main_mkfs.c | 13 - gfs2/mount/Makefile | 18 +- gfs2/mount/mount.gfs2.c | 8 - gfs2/mount/mtab.c | 14 +- gfs2/mount/ondisk1.c | 13 - gfs2/mount/ondisk2.c | 13 - gfs2/mount/util.c | 8 - gfs2/mount/util.h | 8 - gfs2/quota/Makefile | 13 - gfs2/quota/check.c | 13 - gfs2/quota/gfs2_quota.h | 13 - gfs2/quota/main.c | 12 - gfs2/quota/names.c | 13 - gfs2/tool/Makefile | 13 - gfs2/tool/decipher_lockstate_dump | 14 - gfs2/tool/df.c | 13 - gfs2/tool/gfs2_tool.h | 13 - gfs2/tool/iflags.h | 13 - gfs2/tool/layout.c | 13 - gfs2/tool/main.c | 13 - gfs2/tool/misc.c | 13 - gfs2/tool/ondisk.c | 13 - gfs2/tool/parse_lockdump | 14 - gfs2/tool/sb.c | 13 - gfs2/tool/tune.c | 13 - gnbd-kernel/src/Makefile | 13 - gnbd-kernel/src/gnbd.c | 13 - gnbd-kernel/src/gnbd.h | 13 - gnbd/COPYING | 340 --- gnbd/Makefile | 13 - gnbd/client/Makefile | 13 - gnbd/client/gnbd_monitor.c | 12 - gnbd/client/gnbd_monitor.h | 12 - gnbd/client/gnbd_recvd.c | 12 - gnbd/client/monitor_req.c | 12 - gnbd/include/global.h | 13 - gnbd/include/gnbd_endian.h | 13 - gnbd/man/Makefile | 16 +- gnbd/man/fence_gnbd.8 | 87 - gnbd/man/gnbd.8 | 3 - gnbd/man/gnbd_export.8 | 3 - gnbd/man/gnbd_import.8 | 3 - gnbd/man/gnbd_serv.8 | 2 - gnbd/server/Makefile | 13 - gnbd/server/device.c | 12 - gnbd/server/device.h | 12 - gnbd/server/extern_req.c | 11 - gnbd/server/extern_req.h | 12 - gnbd/server/fence.c | 12 - gnbd/server/fence.h | 12 - gnbd/server/gnbd_clusterd.c | 12 - gnbd/server/gnbd_serv.c | 12 - gnbd/server/gnbd_server.h | 12 - gnbd/server/gserv.c | 12 - gnbd/server/gserv.h | 12 - gnbd/server/list.h | 13 - gnbd/server/local_req.c | 12 - gnbd/server/local_req.h | 12 - gnbd/tools/Makefile | 15 +- gnbd/tools/fence_gnbd/Makefile | 37 - gnbd/tools/fence_gnbd/main.c | 340 --- gnbd/tools/gnbd_export/Makefile | 13 - gnbd/tools/gnbd_export/gnbd_export.c | 14 - gnbd/tools/gnbd_import/Makefile | 13 - gnbd/tools/gnbd_import/fence_return.h | 13 - gnbd/tools/gnbd_import/gnbd_import.c | 12 - gnbd/utils/Makefile | 13 - gnbd/utils/gnbd_utils.c | 12 - gnbd/utils/gnbd_utils.h | 12 - gnbd/utils/member_cman.c | 12 - gnbd/utils/member_cman.h | 12 - gnbd/utils/trans.c | 12 - gnbd/utils/trans.h | 12 - group/Makefile | 12 - group/daemon/Makefile | 12 - group/daemon/gd_internal.h | 13 - group/daemon/groupd.h | 13 - group/daemon/main.c | 12 - group/dlm_controld/Makefile | 20 +- group/dlm_controld/action.c | 12 - group/dlm_controld/config.c | 12 - group/dlm_controld/config.h | 12 - group/dlm_controld/cpg.c | 12 - group/dlm_controld/crc.c | 13 - group/dlm_controld/deadlock.c | 12 - group/dlm_controld/dlm_controld.h | 12 - group/dlm_controld/dlm_daemon.h | 12 - group/dlm_controld/group.c | 12 - group/dlm_controld/main.c | 12 - group/dlm_controld/member_cman.c | 12 - group/dlm_controld/netlink.c | 12 - group/dlm_controld/plock.c | 12 - group/gfs_control/Makefile | 17 +- group/gfs_control/main.c | 12 - group/gfs_controld/Makefile | 20 +- group/gfs_controld/config.c | 12 - group/gfs_controld/config.h | 12 - group/gfs_controld/cpg-old.c | 12 - group/gfs_controld/cpg-old.h | 12 - group/gfs_controld/gfs_controld.h | 12 - group/gfs_controld/gfs_daemon.h | 12 - group/gfs_controld/group.c | 12 - group/gfs_controld/main.c | 12 - group/gfs_controld/member_cman.c | 12 - group/gfs_controld/plock.c | 12 - group/gfs_controld/util.c | 12 - group/include/linux_endian.h | 13 - group/lib/Makefile | 25 +- group/lib/libgroup.c | 22 - group/lib/libgroup.h | 22 - group/libgfscontrol/Makefile | 43 +- group/libgfscontrol/libgfscontrol.h | 22 - group/libgfscontrol/main.c | 12 - group/man/Makefile | 12 - group/man/dlm_controld.8 | 6 - group/man/gfs_controld.8 | 6 - group/man/group_tool.8 | 6 - group/man/groupd.8 | 6 - group/test/Makefile | 12 - group/test/clientd.c | 12 - group/tool/Makefile | 12 - group/tool/main.c | 12 - make/copyright.cf | 16 - make/defines.mk.input | 18 +- make/libs.mk | 47 + rgmanager/AUTHORS | 13 - rgmanager/COPYING | 340 --- rgmanager/INSTALL | 7 - rgmanager/Makefile | 13 - rgmanager/NEWS | 2 - rgmanager/include/clulog.h | 22 - rgmanager/include/event.h | 17 - rgmanager/include/findproc.h | 18 - rgmanager/include/platform.h | 19 - rgmanager/include/res-ocf.h | 18 - rgmanager/include/reslist.h | 18 - rgmanager/include/restart_counter.h | 17 - rgmanager/include/rg_locks.h | 17 - rgmanager/include/rg_queue.h | 17 - rgmanager/include/rmtab.h | 18 - rgmanager/include/sets.h | 17 - rgmanager/include/vf.h | 18 - rgmanager/init.d/Makefile | 12 - rgmanager/init.d/rgmanager.in | 6 - rgmanager/man/Makefile | 13 - rgmanager/src/Makefile | 13 - rgmanager/src/clulib/Makefile | 12 - rgmanager/src/clulib/alloc.c | 22 - rgmanager/src/clulib/ckpt_state.c | 18 - rgmanager/src/clulib/clulog.c | 19 - rgmanager/src/clulib/cman.c | 18 - rgmanager/src/clulib/daemon_init.c | 19 - rgmanager/src/clulib/fdops.c | 18 - rgmanager/src/clulib/lock.c | 18 - rgmanager/src/clulib/locktest.c | 18 - rgmanager/src/clulib/members.c | 18 - rgmanager/src/clulib/message.c | 18 - rgmanager/src/clulib/msg_cluster.c | 18 - rgmanager/src/clulib/msg_socket.c | 18 - rgmanager/src/clulib/msgsimple.c | 19 - rgmanager/src/clulib/msgtest.c | 18 - rgmanager/src/clulib/rg_strings.c | 18 - rgmanager/src/clulib/sets.c | 17 - rgmanager/src/clulib/signals.c | 18 - rgmanager/src/clulib/tmgr.c | 19 - rgmanager/src/clulib/vft.c | 18 - rgmanager/src/clulib/wrap_lock.c | 19 - rgmanager/src/daemons/Makefile | 12 - rgmanager/src/daemons/clurmtabd.c | 18 - rgmanager/src/daemons/clurmtabd_lib.c | 18 - rgmanager/src/daemons/depends.c | 19 - rgmanager/src/daemons/event_config.c | 17 - rgmanager/src/daemons/fo_domain.c | 18 - rgmanager/src/daemons/groups.c | 19 - rgmanager/src/daemons/main.c | 19 - rgmanager/src/daemons/reslist.c | 18 - rgmanager/src/daemons/resrules.c | 18 - rgmanager/src/daemons/restart_counter.c | 17 - rgmanager/src/daemons/restree.c | 21 - rgmanager/src/daemons/rg_event.c | 17 - rgmanager/src/daemons/rg_forward.c | 18 - rgmanager/src/daemons/rg_locks.c | 18 - rgmanager/src/daemons/rg_queue.c | 18 - rgmanager/src/daemons/rg_state.c | 18 - rgmanager/src/daemons/rg_thread.c | 18 - rgmanager/src/daemons/service_op.c | 17 - rgmanager/src/daemons/slang_event.c | 17 - rgmanager/src/daemons/test.c | 18 - rgmanager/src/daemons/watchdog.c | 18 - rgmanager/src/resources/Makefile | 12 - rgmanager/src/resources/apache.sh | 23 - rgmanager/src/resources/clusterfs.sh | 20 - rgmanager/src/resources/fs.sh | 20 - rgmanager/src/resources/ip.sh | 20 - rgmanager/src/resources/lvm.sh | 19 - rgmanager/src/resources/lvm_by_lv.sh | 19 - rgmanager/src/resources/lvm_by_vg.sh | 19 - rgmanager/src/resources/mysql.sh | 23 - rgmanager/src/resources/named.sh | 27 +- rgmanager/src/resources/netfs.sh | 20 - rgmanager/src/resources/nfsclient.sh | 19 - rgmanager/src/resources/nfsexport.sh | 20 - rgmanager/src/resources/nfsserver.sh | 19 - rgmanager/src/resources/ocf-shellfuncs | 20 - rgmanager/src/resources/openldap.sh | 23 - rgmanager/src/resources/postgres-8.sh | 29 +- rgmanager/src/resources/samba.sh | 33 +- rgmanager/src/resources/script.sh | 19 - rgmanager/src/resources/service.sh | 19 - rgmanager/src/resources/smb.sh | 24 - rgmanager/src/resources/svclib_nfslock | 18 - rgmanager/src/resources/tomcat-5.sh | 25 +- rgmanager/src/resources/utils/config-utils.sh.in | 19 - rgmanager/src/resources/utils/member_util.sh | 25 - rgmanager/src/resources/utils/messages.sh | 25 - rgmanager/src/resources/utils/ra-skelet.sh | 22 - rgmanager/src/resources/vm.sh | 18 - rgmanager/src/utils/Makefile | 12 - rgmanager/src/utils/cluarp.c | 19 - rgmanager/src/utils/clubufflush.c | 19 - rgmanager/src/utils/clufindhostname.c | 19 - rgmanager/src/utils/clulog.c | 19 - rgmanager/src/utils/clunfslock.sh | 4 - rgmanager/src/utils/clunfsops.c | 18 - rgmanager/src/utils/clusvcadm.c | 18 - rgmanager/src/utils/syscall.h | 17 - scripts/fenceparse | 12 - scripts/uninstall.pl | 13 - 832 files changed, 2124 insertions(+), 25812 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSEze7wgUGcMLQ3qJAQJuag/9ElCvjLF8kvTAzXhIrJFz87bHYBHcoLdu 0sbkXyuqRRJn3lx4Cnvs0OcFKS7Z5QWz7163/n+jnotJkP+ZjEKq4BCz5RbP5jhJ LoEYIfs9AEIdg/1UKxcgIrFZLm/ETexW3v8ou/pnEolo0+xgC6NEQKM2/IHYcQMY EP5kuZIFI8j2NIQJCFDtGFiRWfGyk4mqMdRvm4a1D0D3uTIa1m5rPdm0cGl2mBY9 1YQUp331M79VhAKKAXq0an+0kETeZthHdo/6uxSAB8csOz/oSvH4uZohPTs34QGH AHao2qQH9bXajY8c3UYry36lrVuNyGoJY1yuxJP0X48ua5f04IusuqJDBSRYoTyk lzsXxzzWOPgXY6v2yPZoFHHRBA/p6ugxRWfR0938ZHlpfuI4XprbLtnFg66BCBQ1 KpSha84OWaTZGBBuYYsqVwJcVBYC/GG9USOq/1pq8l9ha3xnwQYhWSgwKHbDPBy4 s5JbPzRvts0K1n7nvgAPbE9IFKRZLaFQjYNpIUbZFNbThJw5o4qAfS+uDfmjnZJO DoWSycVVxfg7Teh0RQYf5fJZ1ZW7nW6XBbp+8Oed2eLnn2xpodt+ghxlvfHUtAjh JWZWJ4EUG+acqPrMkiHWEtGB794XrGy9kaQ7+RSQtJs0TQO7vIiDxXLk9RDw+dKe Hc8zdwtRrho= =cUUA -----END PGP SIGNATURE----- From Alain.Moulle at bull.net Mon Jun 9 09:04:38 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 09 Jun 2008 11:04:38 +0200 Subject: [Linux-cluster] CS5 / about loop "Node is undead" Message-ID: <484CF226.1040602@bull.net> Hi About my problem of node entering a loop : Jun 3 15:54:49 s_sys at xn2 qdiskd[22256]: Writing eviction notice for node 1 Jun 3 15:54:50 s_sys at xn2 qdiskd[22256]: Node 1 evicted Jun 3 15:54:51 s_sys at xn2 qdiskd[22256]: Node 1 is undead. I notice that just before entering this loop, I have a message : Jun 3 15:54:47 s_sys at xn2 fenced[22327]: fencing node "xn1" Jun 3 15:54:48 s_sys at xn2 qdiskd[22256]: Assuming master role but never the message : Jun 3 15:54:47 s_sys at xn2 fenced[22327]: fence "xn1" success Nethertheless, the service of xn1 is well failovered by xn2, but then after the reboot of xn1, we can't start again the CS5 due to the problem of infernal loop "Node is undead" on xn2. whereas when it works correctly, both messages : fencing node "xn1" fence "xn1" success are successive (after about 30s) So my question is : could this pb of infernal loop "Node is undead" be systematically due to a failed fencing phase of xn2 towards xn1 ? PS: note that I have applied patch : http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9 Thanks Regards Alain Moull? From ccaulfie at redhat.com Mon Jun 9 09:21:19 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Mon, 09 Jun 2008 10:21:19 +0100 Subject: [Linux-cluster] DLM Book updated Message-ID: <484CF60F.4050804@redhat.com> I have updated the "Programming Locking Applications" document. Lots of typos and bizarre sentences have been fixed (thanks Bob!). I have also added a new section (chapter 5) which is an overview of the DLM internals for those that want to understand a little of how and where locks are mastered etc. It's no substitute for reading the code but it might make it a little easier :) http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf Chrissie From ccaulfie at redhat.com Mon Jun 9 10:19:14 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Mon, 09 Jun 2008 11:19:14 +0100 Subject: [Linux-cluster] DLM Book updated In-Reply-To: <484CF60F.4050804@redhat.com> References: <484CF60F.4050804@redhat.com> Message-ID: <484D03A2.1030908@redhat.com> Christine Caulfield wrote: > I have updated the "Programming Locking Applications" document. Lots of > typos and bizarre sentences have been fixed (thanks Bob!). I have also > added a new section (chapter 5) which is an overview of the DLM > internals for those that want to understand a little of how and where > locks are mastered etc. > > It's no substitute for reading the code but it might make it a little > easier :) > > http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf I should also have mentioned that the file is also available in the cluster wiki at: http://sources.redhat.com/cluster/wiki/ -- Chrissie From yamato at redhat.com Mon Jun 9 10:29:06 2008 From: yamato at redhat.com (Masatake YAMATO) Date: Mon, 09 Jun 2008 19:29:06 +0900 (JST) Subject: [Linux-cluster] DLM Book updated In-Reply-To: <484D03A2.1030908@redhat.com> References: <484CF60F.4050804@redhat.com> <484D03A2.1030908@redhat.com> Message-ID: <20080609.192906.106743919.yamato@redhat.com> I'm quite happy if you write that wireshark-1.0.0 has DLM3 protocol dissector in the book. Actually wireshark is helpful for the readers of the book to understand the behavior of DLM. Regards, Masatake YAMATO From ccaulfie at redhat.com Mon Jun 9 10:32:54 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Mon, 09 Jun 2008 11:32:54 +0100 Subject: [Linux-cluster] DLM Book updated In-Reply-To: <20080609.192906.106743919.yamato@redhat.com> References: <484CF60F.4050804@redhat.com> <484D03A2.1030908@redhat.com> <20080609.192906.106743919.yamato@redhat.com> Message-ID: <484D06D6.8010005@redhat.com> Masatake YAMATO wrote: > I'm quite happy if you write that wireshark-1.0.0 has DLM3 protocol > dissector in the book. Actually wireshark is helpful for the readers > of the book to understand the behavior of DLM. I did mention it, in another document about Cluster Suite networking. But, yes it would be nice to have a reference in that book too, I'll add it. Thanks Chrissie From alain.richard at equation.fr Mon Jun 9 11:45:02 2008 From: alain.richard at equation.fr (Alain RICHARD) Date: Mon, 9 Jun 2008 13:45:02 +0200 Subject: [Linux-cluster] DLM Book updated In-Reply-To: <484CF60F.4050804@redhat.com> References: <484CF60F.4050804@redhat.com> Message-ID: Le 9 juin 08 ? 11:21, Christine Caulfield a ?crit : > I have updated the "Programming Locking Applications" document. Lots > of typos and bizarre sentences have been fixed (thanks Bob!). I have > also added a new section (chapter 5) which is an overview of the DLM > internals for those that want to understand a little of how and > where locks are mastered etc. > > It's no substitute for reading the code but it might make it a > little easier :) > > http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster You mention in this document that dlm is able to use SCTP, but I found no information on how to do it, is there any documents about it ? Regards, -- Alain RICHARD EQUATION SA Tel : +33 477 79 48 00 Fax : +33 477 79 48 01 E-Liance, Op?rateur des entreprises et collectivit?s, Liaisons Fibre optique, SDSL et ADSL -------------- next part -------------- An HTML attachment was scrubbed... URL: From ccaulfie at redhat.com Mon Jun 9 12:06:43 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Mon, 09 Jun 2008 13:06:43 +0100 Subject: [Linux-cluster] DLM Book updated In-Reply-To: References: <484CF60F.4050804@redhat.com> Message-ID: <484D1CD3.5070907@redhat.com> Alain RICHARD wrote: > > Le 9 juin 08 ? 11:21, Christine Caulfield a ?crit : > >> I have updated the "Programming Locking Applications" document. Lots >> of typos and bizarre sentences have been fixed (thanks Bob!). I have >> also added a new section (chapter 5) which is an overview of the DLM >> internals for those that want to understand a little of how and where >> locks are mastered etc. >> >> It's no substitute for reading the code but it might make it a little >> easier :) >> >> http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf >> >> Chrissie >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > You mention in this document that dlm is able to use SCTP, but I found > no information on how to do it, is there any documents about it ? > It's not (well) tested so it's regarded as unsupported at the moment. If you want to test it you'll need to add this to cluster.conf (inside the tags: and the following sysctls to keep SCTP itself happy" # echo 4194304 > /proc/sys/net/core/rmem_default # echo 4194304 > /proc/sys/net/core/rmem_max Chrissie From Alain.Moulle at bull.net Mon Jun 9 12:23:13 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 09 Jun 2008 14:23:13 +0200 Subject: [Linux-cluster] CS5 / quorum disk and heuristics Message-ID: <484D20B1.9040502@bull.net> Hi One thing bothers me again : I have this record in cluster.conf : where 172.20.0.110 is a third machine not in my cluster pair node1/node2 My last understanding was that quorum disk was NOT a redundancy of heart-beat, meaning that if heart-beat interface fails, there is a failover but it is always the node with the expected min_score in quorum disk which fence the other. So I thought that the quorum disk check was operationnal only if the node detects a problem on heart-beat interface ... but when I set down the interface on the third machine, and after a few seconds, both nodes node1/node2 are killed !!! Whereas heart-beat interface was working fine. And after reboot, I can see "cluster not quorate" etc. So in fact, even if the heart-beat interface works fine, but there is not the expected min_score for heuristics of quorum disk, both nodes are stopped. Is it the normal behavior ? Thanks Regards Alain Moull? From rohara at redhat.com Mon Jun 9 14:56:34 2008 From: rohara at redhat.com (Ryan O'Hara) Date: Mon, 09 Jun 2008 09:56:34 -0500 Subject: [Linux-cluster] Changes in libccs behaviour (PLEASE READ!) In-Reply-To: References: Message-ID: <484D44A2.4090706@redhat.com> Fabio M. Di Nitto wrote: > ccs_test(8): not fully completed yet (another email will follow). ccs_test should go away. It was never intended to be used as a production tool, it was simply intended to be a tool to test ccs. Futhermore, the fact that you must create connections and then use those connection ID's in order to extract information from ccs is overkill. What we really want is a simple tool that handles xpath queries for config information. The idea of "connections" should be hidden from the user. I believe there is some overlap between ccs_test and ccs_tool. If I recall, ccs_tool can handle some simple xpath queries. Even better is that users do not have to create connections, etc. From fdinitto at redhat.com Mon Jun 9 15:22:38 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 9 Jun 2008 17:22:38 +0200 (CEST) Subject: [Linux-cluster] Changes in libccs behaviour (PLEASE READ!) In-Reply-To: <484D44A2.4090706@redhat.com> References: <484D44A2.4090706@redhat.com> Message-ID: On Mon, 9 Jun 2008, Ryan O'Hara wrote: > Fabio M. Di Nitto wrote: > >> ccs_test(8): not fully completed yet (another email will follow). > > ccs_test should go away. It was never intended to be used as a production > tool, it was simply intended to be a tool to test ccs. Indeed but it used as such and this is a fact :) > Futhermore, the fact > that you must create connections and then use those connection ID's in order > to extract information from ccs is overkill. What we really want is a simple > tool that handles xpath queries for config information. The idea of > "connections" should be hidden from the user. Not anymore. with the new libccs, there is no need to establish a connection. You just need to pass something > 0 instead of the fd. I kept the fd option around to avoid breaking the compatibility. What ccs_test is missing is only an option to select full xpath vs xpath lite at the moment. > I believe there is some overlap between ccs_test and ccs_tool. If I recall, > ccs_tool can handle some simple xpath queries. Even better is that users do > not have to create connections, etc. No, ccs_tool doesn't handle queries at all. As above, no need to create connections anylonger :) I only need to finish that switch and write those changes. Fabio -- I'm going to make him an offer he can't refuse. From fdinitto at redhat.com Mon Jun 9 19:22:53 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 9 Jun 2008 21:22:53 +0200 (CEST) Subject: [Linux-cluster] New fencing method In-Reply-To: <20080519230347.GA30667@kallisti.us> References: <20080519230347.GA30667@kallisti.us> Message-ID: On Mon, 19 May 2008, Ross Vandegrift wrote: > Hello everyone, > > I wrote a new fencing method script that fences by remotely shutting > down a switchport. The idea is to fabric fence an iSCSI client by > shutting down the port used for iSCSI connectivity. Hi Ross, for your information the agent will be part of our stable releases starting from the next one (2.03.04). Thanks again for your help and contribution. Fabio -- I'm going to make him an offer he can't refuse. From lhh at redhat.com Mon Jun 9 20:24:00 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 09 Jun 2008 16:24:00 -0400 Subject: [Linux-cluster] CS5 / is there a tunable timer between the three start/stop tries ? In-Reply-To: <48468843.5040300@bull.net> References: <48468843.5040300@bull.net> Message-ID: <1213043040.27637.4.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-06-04 at 14:19 +0200, Alain Moulle wrote: > Hi > > With CS5, when the status of a service returns failed, the CS5 tries > to start three times the service , so we can see three start/stop > sequences if it does not start correctly each time. The following > start is always launchec just after the stop, > is there a tunable timer between the three start/stop tries ? Not currently. -- Lon From lhh at redhat.com Mon Jun 9 20:25:40 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 09 Jun 2008 16:25:40 -0400 Subject: [Linux-cluster] CS5 / about loop "Node is undead" In-Reply-To: <48468ED9.3050401@bull.net> References: <48468ED9.3050401@bull.net> Message-ID: <1213043140.27637.7.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-06-04 at 14:47 +0200, Alain Moulle wrote: > Hi > > About my problem of node entering a loop : > Jun 3 15:54:49 s_sys at xn2 qdiskd[22256]: Writing eviction notice for node 1 > Jun 3 15:54:50 s_sys at xn2 qdiskd[22256]: Node 1 evicted > Jun 3 15:54:51 s_sys at xn2 qdiskd[22256]: Node 1 is undead. > > I notice that just before entering this loop, I have a message : > Jun 3 15:54:47 s_sys at xn2 fenced[22327]: fencing node "xn1" > Jun 3 15:54:48 s_sys at xn2 qdiskd[22256]: Assuming master role > > but never the message : > Jun 3 15:54:47 s_sys at xn2 fenced[22327]: fence "xn1" success > > Nethertheless, the service of xn1 is well failovered by xn2, but > then after the reboot of xn1, we can't start again the CS5 due > to the problem of infernal loop "Node is undead" on xn2. > > whereas when it works correctly, both messages : > fencing node "xn1" > fence "xn1" success > are successive (after about 30s) > > So my question is : could this pb of infernal loop "Node is undead" > be systematically due to a failed fencing phase of xn2 towards xn1 ? > > PS: note that I have applied patch : > http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9 Yes. If qdiskd thinks the node is dead and the node started writing to the disk again (which is what fencing should prevent), it will display those messages. -- Lon From rpeterso at redhat.com Mon Jun 9 21:16:56 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Mon, 09 Jun 2008 16:16:56 -0500 Subject: [Linux-cluster] GFS performance tuning Message-ID: <1213046217.21321.53.camel@technetium.msp.redhat.com> Hi Everyone, I just wanted to let everyone here know that I just updated the cluster wiki page regarding GFS performance tuning. I added a bunch of information about increasing GFS performance: 1. How to use "fast statfs". 2. Disabling updatedb for GFS. 3. More considerations about the Resource Group size and the new "bitfit" function. 4. Designing your environment with the DLM in mind. 5. How to use "glock trimming". The updates are here: http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_tuning Regards, Bob Peterson Red Hat GFS & Clustering From james.hofmeister at hp.com Mon Jun 9 21:19:33 2008 From: james.hofmeister at hp.com (Hofmeister, James (WTEC Linux)) Date: Mon, 9 Jun 2008 21:19:33 +0000 Subject: [Linux-cluster] Scipt to revert GFS2 to GFS1? In-Reply-To: <1213043140.27637.7.camel@ayanami.boston.devel.redhat.com> Message-ID: Hello All, I have a customer RHEL-5.1 who converted a GFS1 file system to GFS2 with gfs2_convert. They are experiencing hangs on unmount of GFS2 file systems since this change. Is there a tool to convert GFS2 file systems back to GFS1? Regards, James Hofmeister Hewlett Packard Linux Solutions Engineer From rpeterso at redhat.com Mon Jun 9 21:32:41 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Mon, 09 Jun 2008 16:32:41 -0500 Subject: [Linux-cluster] Scipt to revert GFS2 to GFS1? In-Reply-To: References: Message-ID: <1213047161.21321.67.camel@technetium.msp.redhat.com> On Mon, 2008-06-09 at 21:19 +0000, Hofmeister, James (WTEC Linux) wrote: > Hello All, > > I have a customer RHEL-5.1 who converted a GFS1 file system to GFS2 with gfs2_convert. They are experiencing hangs on unmount of GFS2 file systems since this change. Is there a tool to convert GFS2 file systems back to GFS1? > > Regards, > James Hofmeister > Hewlett Packard Linux Solutions Engineer The short answer is No, the gfs2_convert tool is one-way. It could be done because the on-disk formats are not that different. You would have to write a tool that does gfs2_convert in reverse: changing all the inode numbers back to match their disk block locations, converting all your journals back into giant journal-sized holes in the file system, and changing all the file flags from standard Linux format to GFS format. This would not be impossible, but it would be a big project. The biggest challenge would be in figuring out where the journals belong and moving anything that got moved to those locations different RGs. The would be a very good "start" of a "gfs_shrink" tool that doesn't exist, by the way. So I recommend they upgrade to the latest and greatest GFS2 code, which would either be from the nwm git tree, or else the newest RHEL5.2 kmod RPM. Then, if they still have a problem unmounting, post the symptoms and we'll try to address the issues here. If there is a bug in the unmount code, we need to find and fix it. I am only aware of one such bug at the moment, which is: https://bugzilla.redhat.com/show_bug.cgi?id=207697 You may or may not be able to read this bug record; My apologies if you can't read it; the bug record permissions are out of my control. This bug is only for the unmount that happens when systems are rebooted. There is a work-around for it, too, which is to enable the gfs2 init script. If you've encountered some other problem, and it can be recreated on recent levels of GFS2 code, please open a bugzilla record so we can help find and fix it. Regards, Bob Peterson Red Hat Clustering & GFS From lhh at redhat.com Mon Jun 9 21:35:54 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 09 Jun 2008 17:35:54 -0400 Subject: [Linux-cluster] CS5 / quorum disk and heuristics In-Reply-To: <484D20B1.9040502@bull.net> References: <484D20B1.9040502@bull.net> Message-ID: <1213047355.27637.20.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-06-09 at 14:23 +0200, Alain Moulle wrote: > My last understanding was that quorum disk was NOT a redundancy of heart-beat, > meaning that if heart-beat interface fails, there is a failover but it is > always the node with the expected min_score in quorum disk which fence the > other. Qdiskd can never tell CMAN or openais that a computer is a member of the cluster, but it can remove nodes from the cluster. > So I thought that the quorum disk check was operationnal only if the node > detects a problem on heart-beat interface ... but when I set down the interface > on the third machine, and after a few seconds, both nodes node1/node2 > are killed !!! Think of the heuristics as asking the question: "Am I fit to participate in the cluster?" If the answer is "yes" and suddenly changes to "no", the node removes itself. > Whereas heart-beat interface was working fine. You can disable these by setting allow_kill="0" and/or reboot="0" (see qdisk(5)). > And after reboot, I can see "cluster not quorate" etc. This happens after both nodes boot, or just one? If both nodes boot up with the third node off, they should still be able to form a quorum by themselves, even if qdiskd isn't running or its score isn't sufficient. -- Lon From lstrozzini at gmail.com Tue Jun 10 08:18:40 2008 From: lstrozzini at gmail.com (Loris Strozzini) Date: Tue, 10 Jun 2008 10:18:40 +0200 Subject: [Linux-cluster] Basic RHEL 5.1 cluster problem Message-ID: <4b28518b0806100118r10c908b6m8c3e3321355ab180@mail.gmail.com> Hi all, I have a problem with my RHEL 5.2 a 2 node cluster running on IBM X3650. My cluster is configured for fencing on IBM RSAII via system-config-cluster, with only one network interface, no shared storage and I have followed the Red Hat Cluster suite for installation. At the first look, no syntax error in my cluster.conf but when I'm going to start the cman and the rgmanager daemons on primary node the other node reboot or poweroff immediately. Can anyone help me? Thanks in advance Loris My cluster.conf: