From tedley at gmail.com Fri Aug 1 01:01:57 2008 From: tedley at gmail.com (ted) Date: Thu, 31 Jul 2008 21:01:57 -0400 Subject: [Linux-cluster] Some nodes won't join after being fenced In-Reply-To: <824ffea00807311325u186e8129kf5218e6dbc2a4d06@mail.gmail.com> References: <48920785.4060300@adamdein.com> <824ffea00807311325u186e8129kf5218e6dbc2a4d06@mail.gmail.com> Message-ID: We seem to have found part of the culprit. We're using an Extreme switch that handles all of our traffic in seperate VLAN's, and the IGMP bits of the ExtremeOS seem to be interfering with the clusters ability to recover itself from such an episode. At the moment we're leaning towards the Juniper switch as we moved to an identically configured (as far as ports and VLANs go) Juniper EX-4200 and the cluster was able to recover itself with a single node (of nine) being fenced. While on the Extreme, each node needed to be fenced in turn for the cluster to be able to recover fully. This means each node being able to mount the GFS mount r/w and actually be able to write and delete test files on the mount point. Our testing continues and we're trying to come up with "real" evidence such as proof that some parts of the multicast traffic are or aren't being dealt with properly. So far the empirical evidence suggests the above conclusions. -ted On 7/31/08, Brandon Young wrote: > > I have occasionally run into this problem, too. I have found that > sometimes I can work around the problem by chkconfig'ing clvmd,cman,and > rgmanager off, rebooting, then manually starting cman, rgmanager, clvmd (in > that order). Usually, after that, I am able to fence the node(s) and they > will rejoin automatically (after re-enabling automatic startup with > chkconfig, of course). I know this workaround doesn't explain *why* it > happens, but it has more than once helped me get my cluster nodes back > online without having to reboot all the nodes. > > On Thu, Jul 31, 2008 at 1:42 PM, Mailing List wrote: > >> Hello, >> >> I currently have a 9 node centos 5.1 cman/gfs cluster which I've managed >> to break. >> >> It is broken in almost exactly the same way as stated in these two >> previous threads: >> >> http://www.spinics.net/lists/cluster/msg10304.html >> http://www.redhat.com/archives/linux-cluster/2008-May/msg00060.html >> >> However, I can find no resolution in the archives. My only guaranteed >> resolution at this point is a cold restart of all nodes which to me seems >> ridiculous (ie: I'm missing something). >> >> To add a little details, I have nodes cluster1...9. Nodes 7 & 8 are >> broken. When I fence/reboot them, cman starts but times out on starting >> fencing. cman_tools nodes shows them as joined but the fence domain looks >> broke. >> >> Any ideas? >> >> I have included some information for a good node, bad node, and >> /var/log/messages from a good node that did the fencing. >> >> Good Node: >> >> [root at cluster1 ~]# cman_tool nodes >> Node Sts Inc Joined Name >> 1 M 768 2008-07-31 12:47:19 cluster1-rhc >> 2 M 776 2008-07-31 12:47:37 cluster2-rhc >> 3 M 772 2008-07-31 12:47:19 cluster3-rhc >> 4 M 788 2008-07-31 12:56:20 cluster4-rhc >> 5 M 772 2008-07-31 12:47:19 cluster5-rhc >> 6 M 784 2008-07-31 12:52:50 cluster6-rhc >> 7 M 808 2008-07-31 13:24:24 cluster7-rhc >> 8 X 800 cluster8-rhc >> 9 M 772 2008-07-31 12:47:19 cluster9-rhc >> [root at cluster1 ~]# cman_tool services >> type level name id state >> fence 0 default 00010003 FAIL_START_WAIT >> [1 2 3 4 5 6 9] >> dlm 1 testgfs1 00020005 none >> [1 2 3 4 5 6] >> gfs 2 testgfs1 00010005 none >> [1 2 3 4 5 6] >> [root at cluster1 ~]# cman_tool status >> Version: 6.1.0 >> Config Version: 13 >> Cluster Name: test >> Cluster Id: 1678 >> Cluster Member: Yes >> Cluster Generation: 808 >> Membership state: Cluster-Member >> Nodes: 8 >> Expected votes: 9 >> Total votes: 8 >> Quorum: 5 >> Active subsystems: 7 >> Flags: Dirty >> Ports Bound: 0 >> Node name: cluster1-rhc >> Node ID: 1 >> Multicast addresses: 239.192.6.148 >> Node addresses: 10.128.161.81 >> [root at cluster1 ~]# group_tool >> type level name id state >> fence 0 default 00010003 FAIL_START_WAIT >> [1 2 3 4 5 6 9] >> dlm 1 testgfs1 00020005 none >> [1 2 3 4 5 6] >> gfs 2 testgfs1 00010005 none >> [1 2 3 4 5 6] >> [root at cluster1 ~]# >> >> >> Bad/broken Node: >> >> [root at cluster7 ~]# cman_tool nodes >> Node Sts Inc Joined Name >> 1 M 808 2008-07-31 13:24:24 cluster1-rhc >> 2 M 808 2008-07-31 13:24:24 cluster2-rhc >> 3 M 808 2008-07-31 13:24:24 cluster3-rhc >> 4 M 808 2008-07-31 13:24:24 cluster4-rhc >> 5 M 808 2008-07-31 13:24:24 cluster5-rhc >> 6 M 808 2008-07-31 13:24:24 cluster6-rhc >> 7 M 804 2008-07-31 13:24:24 cluster7-rhc >> 8 X 0 cluster8-rhc >> 9 M 808 2008-07-31 13:24:24 cluster9-rhc >> [root at cluster7 ~]# cman_tool services >> type level name id state >> fence 0 default 00000000 JOIN_STOP_WAIT >> [1 2 3 4 5 6 7 9] >> [root at cluster7 ~]# cman_tool status >> Version: 6.1.0 >> Config Version: 13 >> Cluster Name: test >> Cluster Id: 1678 >> Cluster Member: Yes >> Cluster Generation: 808 >> Membership state: Cluster-Member >> Nodes: 8 >> Expected votes: 9 >> Total votes: 8 >> Quorum: 5 >> Active subsystems: 7 >> Flags: Dirty >> Ports Bound: 0 >> Node name: cluster7-rhc >> Node ID: 7 >> Multicast addresses: 239.192.6.148 >> Node addresses: 10.128.161.87 >> [root at cluster7 ~]# group_tool >> type level name id state >> fence 0 default 00000000 JOIN_STOP_WAIT >> [1 2 3 4 5 6 7 9] >> [root at cluster7 ~]# >> >> >> /var/log/messages: >> >> Jul 31 13:20:54 cluster3 fence_node[3813]: Fence of "cluster7-rhc" was >> successful >> Jul 31 13:21:03 cluster3 fence_node[3815]: Fence of "cluster8-rhc" was >> successful >> Jul 31 13:21:11 cluster3 openais[3084]: [TOTEM] entering GATHER state from >> 12. >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering GATHER state from >> 11. >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Saving state aru 89 high >> seq received 89 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Storing new sequence id >> for ring 324 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering COMMIT state. >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering RECOVERY state. >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [0] member >> 10.128.161.81: >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep >> 10.128.161.81 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 >> received flag 1 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [1] member >> 10.128.161.82: >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep >> 10.128.161.81 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 >> received flag 1 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [2] member >> 10.128.161.83: >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep >> 10.128.161.81 >> Jul 31 13:21:16 cluster3 kernel: dlm: closing connection to node 7 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 >> received flag 1 >> Jul 31 13:21:16 cluster3 kernel: dlm: closing connection to node 8 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [3] member >> 10.128.161.84: >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep >> 10.128.161.81 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 >> received flag 1 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [4] member >> 10.128.161.85: >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep >> 10.128.161.81 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 >> received flag 1 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [5] member >> 10.128.161.86: >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep >> 10.128.161.81 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 >> received flag 1 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] position [6] member >> 10.128.161.89: >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] previous ring seq 800 rep >> 10.128.161.81 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] aru 89 high delivered 89 >> received flag 1 >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] Did not need to originate >> any messages in recovery. >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] CLM CONFIGURATION CHANGE >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] New Configuration: >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.81) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.82) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.83) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.84) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.85) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.86) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.89) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] Members Left: >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.87) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.88) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] Members Joined: >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] CLM CONFIGURATION CHANGE >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] New Configuration: >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.81) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.82) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.83) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.84) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.85) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.86) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.89) >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] Members Left: >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] Members Joined: >> Jul 31 13:21:16 cluster3 openais[3084]: [SYNC ] This node is within the >> primary component and will provide service. >> Jul 31 13:21:16 cluster3 openais[3084]: [TOTEM] entering OPERATIONAL >> state. >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.81 >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.82 >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.83 >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.84 >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.85 >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.86 >> Jul 31 13:21:16 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.89 >> Jul 31 13:21:16 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 2 >> Jul 31 13:21:16 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 3 >> Jul 31 13:21:16 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 4 >> Jul 31 13:21:16 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 5 >> Jul 31 13:21:16 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 6 >> Jul 31 13:21:16 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 9 >> Jul 31 13:21:16 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering GATHER state from >> 11. >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Saving state aru 68 high >> seq received 68 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Storing new sequence id >> for ring 328 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering COMMIT state. >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering RECOVERY state. >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [0] member >> 10.128.161.81: >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep >> 10.128.161.81 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 >> received flag 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [1] member >> 10.128.161.82: >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep >> 10.128.161.81 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 >> received flag 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [2] member >> 10.128.161.83: >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep >> 10.128.161.81 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 >> received flag 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [3] member >> 10.128.161.84: >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep >> 10.128.161.81 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 >> received flag 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [4] member >> 10.128.161.85: >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep >> 10.128.161.81 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 >> received flag 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [5] member >> 10.128.161.86: >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep >> 10.128.161.81 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 >> received flag 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [6] member >> 10.128.161.87: >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep >> 10.128.161.87 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 9 high delivered 9 >> received flag 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] position [7] member >> 10.128.161.89: >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] previous ring seq 804 rep >> 10.128.161.81 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] aru 68 high delivered 68 >> received flag 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] Did not need to originate >> any messages in recovery. >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] CLM CONFIGURATION CHANGE >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] New Configuration: >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.81) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.82) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.83) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.84) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.85) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.86) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.89) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] Members Left: >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] Members Joined: >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] CLM CONFIGURATION CHANGE >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] New Configuration: >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.81) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.82) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.83) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.84) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.85) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.86) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.87) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.89) >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] Members Left: >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] Members Joined: >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] r(0) ip( >> 10.128.161.87) >> Jul 31 13:24:24 cluster3 openais[3084]: [SYNC ] This node is within the >> primary component and will provide service. >> Jul 31 13:24:24 cluster3 openais[3084]: [TOTEM] entering OPERATIONAL >> state. >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.81 >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.82 >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.83 >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.84 >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.85 >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.86 >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.87 >> Jul 31 13:24:24 cluster3 openais[3084]: [CLM ] got nodejoin message >> 10.128.161.89 >> Jul 31 13:24:24 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 6 >> Jul 31 13:24:24 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 9 >> Jul 31 13:24:24 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 1 >> Jul 31 13:24:24 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 2 >> Jul 31 13:24:24 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 3 >> Jul 31 13:24:24 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 4 >> Jul 31 13:24:24 cluster3 openais[3084]: [CPG ] got joinlist message from >> node 5 >> >> Thanks! >> >> Adam >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Fri Aug 1 09:17:50 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Fri, 1 Aug 2008 11:17:50 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.99.07 (development snapshot) released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its community are proud to announce the 2.99.07 release from the master branch. The development cycle for 3.0 is proceeding at a very good speed and mostlikely one of the next releases will be 3.0alpha1. All features designed for 3.0 are being completed and taking a proper shape, the library API has been stable for sometime (and will soon be marked as 3.0 soname). Stay tuned for upcoming updates! The 2.99.XX releases are _NOT_ meant to be used for production environments.. yet. The master branch is the main development tree that receives all new features, code, clean up and a whole brand new set of bugs, At some point in time this code will become the 3.0 stable release. Everybody with test equipment and time to spare, is highly encouraged to download, install and test the 2.99 releases and more important report problems. In order to build the 2.99.07 release you will need: - - openais svn r1579. Porting to corosync is a work in progress. - - linux kernel (2.6.26) from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git (but userland can run on 2.6.25 in compatibility mode) NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We are still shipping shared libraries but remember that they can change anytime without warning. A bunch of new shared libraries have been added. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.07.tar.gz https://fedorahosted.org/releases/c/l/cluster/cluster-2.99.07.tar.gz In order to use GFS1, the Linux kernel requires a minimal patch: ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch https://fedorahosted.org/releases/c/l/cluster/lockproto-exports.patch To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.99.06): Andrew Price (1): [GFS2] libgfs2: Build with -fPIC Bob Peterson (14): Print log header flags for gfs journals. Speed up userspace bitmap manipulation code. gfs_fsck crosswrite for block number sanity checking Fix some bad references to gfs_tool and gfs_fsck Deleted unused function print_map Shrink memory 1: eliminate b_size from pseudo-buffer-heads Shrink memory 2: get rid of 3 huge in-core bitmaps Shrink memory 3: smaller link counts in inode_info Better error reporting in gfs2_fsck RGRepair: Account for RG blocks inside journals gfs2_fsck dupl. blocks between EA and data gfs2_edit: Ability to enter "journalX" in block number. gfs2_edit: was parsing out gfs1 log descriptors improperly gfs2_edit: Improved gfs journal dumps Christine Caulfield (13): [CCS] Set errno when an error occurs. [CMAN] Don't use logsys in config modules. Revert "[CMAN] Don't use logsys in config modules." [CMAN] Don't use logsys in config modules. [CCS] Fold ccs_test into ccs_tool and tidy [CCS] add -c flag to ccs_tool query [CONFIG] Add some more errnos to libccsconfdb [CCS] Set return status on failure [CCS] Make ccs_tool/ccs_test more consistent [CMAN] Fix overridden node names [CMAN] pass COROSYNC_ env variables to the daemon [CMAN] Display the node's votes in cman_tool status qdisk: fix compile error when building without debug. David Teigland (19): gfs_controld: change start message from new members gfs_controld: add missing endian conversion gfs_controld: byte swap ids earlier gfs_controld: close dlm_controld connection fenced: improved start messages fenced: munge config option code fenced: debug logsys options dlm_controld: improved start messages fenced: complete messages copy start messages fenced: munge logging dlm_controld: use logsys gfs_controld: use logsys dlm_controld/gfs_controld: add logging.c file groupd: use logsys groupd: detect group_mode fenced: use group_mode detection dlm_controld: use group_mode detection gfs_controld: use group_mode detection fence_tool: add domain member checks Fabio M. Di Nitto (42): [CCS] Fix LEGACY_CODE ifdef [BUILD] Implement --enable_legacy_code in the build system [BUILD] Add ccs_test replacement when building legacy_code [BUILD] Fix ccs.h include path [BUILD] Fix doc install target when building objects outside source tree [CCS] Kill obsolted ccs_test [RGMANAGER] Port all resource agents to new ccs interface [RGMANAGER] Port smb resource agent to ccs_tool [BUILD] Fix race condition in oldconfig update/execution [RGMANAGER] Use proper ccs_tool query output [BUILD] Fix ccs_tool/ccs_test build with new compat code [CCS] Inflict hopefully last compat issues love to ccs_t* Revert "[RGMANAGER] Use proper ccs_tool query output" [RGMANAGER] Port ccs_get to proper ccs_tool output [RGMANGER] Fix call to ccs_tool [BUILD] Fix ccs_tool linking dir order [BUILD] Fix logrotate snippet filename [FENCE] Sync fence_apc_snmp from RHEL47 branch [BUILD] Fix LOGDIR usage [FENCE] Fix fence_apc_snmp logging [BUILD] Cleanup linking order for logsys [BUILD] Cleanup groupd makefile build: update .gitignore Revert "fence: port scsi agent to use ccs_tool query and drop XML::LibXML requirement" Revert "fence: simplify init script" Revert "rgmanger: remove check on cluster.conf from rgmanager init script" rgmanger: remove check on cluster.conf from rgmanager init script fence: simplify init script fence: port scsi agent to use ccs_tool query and drop XML::LibXML requirement rgmanager: fix clean target cman: init script should not user cluster.conf directly rgmanager: init script does not need network config config: allow users to override default config file in xmlconfig test commit Revert "test commit" bindings: add first cut of perl Cluster:CCS bindings: improve Cluster::CCS description build: clean up perl bindings build system misc: clean up "char const *" vs "const char *" init: standardize init scripts to /etc/sysconfig/cluster build: fix bindings build when using external object tree bindings: fix CCS.pm doc Lon Hohberger (2): [rgmanager] Add optional save/restore to vm resource [qdisk] Make stop_cman="1" work if heuristics fail during initialization Ryan McCabe (1): fence: update apc snmp agent Ryan O'Hara (3): gfs_mkfs: change the way we check to see if a device is mounted cman: add option to init script to prevent joining the fence domain cman: fix typo (#!/bin/bash) from previous commit .gitignore | 7 + bindings/perl/Makefile | 4 +- bindings/perl/ccs/CCS.pm.in | 145 +++++ bindings/perl/ccs/CCS.xs | 82 +++ bindings/perl/ccs/MANIFEST | 7 + bindings/perl/ccs/META.yml.in | 13 + bindings/perl/ccs/Makefile.PL | 28 + bindings/perl/ccs/Makefile.bindings | 11 + bindings/perl/ccs/test.pl | 20 + bindings/perl/ccs/typemap | 1 + ccs/ccs_tool/Makefile | 35 +- ccs/ccs_tool/ccs_tool.c | 261 ++++++++- ccs/ccs_tool/old_parser.c | 688 ---------------------- ccs/ccs_tool/old_parser.h | 64 -- ccs/ccs_tool/upgrade.c | 259 -------- ccs/ccs_tool/upgrade.h | 6 - ccs/libccscompat/libccscompat.h | 2 +- ccs/man/Makefile | 5 + ccs/man/ccs_test.8 | 132 +++++ cman/cman_tool/cman_tool.h | 2 +- cman/cman_tool/join.c | 19 +- cman/cman_tool/main.c | 7 +- cman/daemon/cman-preconfig.c | 35 +- cman/init.d/Makefile | 16 +- cman/init.d/cman | 648 ++++++++++++++++++++ cman/init.d/cman.in | 592 ------------------- cman/qdisk/main.c | 4 +- config/libs/libccsconfdb/ccs.h | 2 +- config/libs/libccsconfdb/libccs.c | 69 ++- config/plugins/ldap/configldap.c | 10 +- config/plugins/xml/config.c | 20 +- config/tools/Makefile | 2 +- config/tools/ccs_test/Makefile | 32 - config/tools/ccs_test/ccs_test.c | 147 ----- config/tools/man/Makefile | 2 +- config/tools/man/ccs_test.8 | 132 ----- configure | 23 +- doc/Makefile | 6 +- fence/agents/apc_snmp/fence_apc_snmp.py | 581 +++++++++++-------- fence/agents/scsi/fence_scsi.pl | 22 +- fence/agents/scsi/fence_scsi_test.pl | 26 +- fence/agents/scsi/scsi_reserve | 24 +- fence/fence_tool/fence_tool.c | 260 ++++----- fence/fenced/Makefile | 6 +- fence/fenced/config.c | 68 ++- fence/fenced/config.h | 29 + fence/fenced/cpg.c | 565 +++++++++++------- fence/fenced/fd.h | 40 +- fence/fenced/group.c | 29 + fence/fenced/logging.c | 42 +- fence/fenced/main.c | 90 ++-- fence/fenced/member_cman.c | 3 +- fence/fenced/recover.c | 21 +- fence/libfenced/libfenced.h | 3 + gfs/gfs_mkfs/main.c | 29 +- gfs2/edit/hexedit.c | 290 +++++++--- gfs2/edit/savemeta.c | 9 +- gfs2/fsck/eattr.c | 21 +- gfs2/fsck/eattr.h | 20 +- gfs2/fsck/fs_recovery.c | 4 +- gfs2/fsck/fsck.h | 5 +- gfs2/fsck/initialize.c | 10 +- gfs2/fsck/lost_n_found.c | 7 +- gfs2/fsck/main.c | 35 +- gfs2/fsck/metawalk.c | 177 ++++-- gfs2/fsck/metawalk.h | 16 +- gfs2/fsck/pass1.c | 405 +++++++++----- gfs2/fsck/pass1b.c | 95 ++-- gfs2/fsck/pass1c.c | 69 ++- gfs2/fsck/pass2.c | 61 ++- gfs2/fsck/pass3.c | 20 +- gfs2/fsck/pass4.c | 11 +- gfs2/fsck/pass5.c | 2 +- gfs2/fsck/rgrepair.c | 58 ++- gfs2/libgfs2/Makefile | 1 + gfs2/libgfs2/bitmap.c | 79 ++- gfs2/libgfs2/block_list.c | 232 ++++---- gfs2/libgfs2/buf.c | 1 - gfs2/libgfs2/fs_bits.c | 2 +- gfs2/libgfs2/fs_ops.c | 38 +- gfs2/libgfs2/libgfs2.h | 93 ++- gfs2/libgfs2/recovery.c | 2 +- gfs2/libgfs2/rgrp.c | 8 + group/daemon/Makefile | 10 +- group/daemon/app.c | 3 + group/daemon/cpg.c | 369 ++++++++++++ group/daemon/gd_internal.h | 51 ++- group/daemon/logging.c | 170 ++++++ group/daemon/main.c | 177 ++++++- group/dlm_controld/Makefile | 8 +- group/dlm_controld/config.c | 39 ++- group/dlm_controld/config.h | 5 +- group/dlm_controld/cpg.c | 350 ++++++------ group/dlm_controld/dlm_daemon.h | 34 +- group/dlm_controld/group.c | 29 + group/dlm_controld/logging.c | 171 ++++++ group/dlm_controld/main.c | 63 +-- group/dlm_controld/member_cman.c | 3 +- group/gfs_controld/Makefile | 6 +- group/gfs_controld/config.c | 59 ++- group/gfs_controld/config.h | 5 +- group/gfs_controld/cpg-new.c | 188 ++++--- group/gfs_controld/gfs_daemon.h | 44 ++- group/gfs_controld/group.c | 29 + group/gfs_controld/logging.c | 171 ++++++ group/gfs_controld/main.c | 52 ++- group/gfs_controld/member_cman.c | 1 + group/gfs_controld/util.c | 1 + group/lib/libgroup.c | 25 + group/lib/libgroup.h | 2 + make/binding-passthrough.mk | 7 + make/defines.mk.input | 3 +- make/fencebuild.mk | 1 + make/install.mk | 4 +- make/perl-binding-common.mk | 30 + rgmanager/init.d/Makefile | 12 +- rgmanager/init.d/rgmanager | 141 +++++ rgmanager/init.d/rgmanager.in | 154 ----- rgmanager/src/resources/apache.sh | 11 +- rgmanager/src/resources/mysql.sh | 12 +- rgmanager/src/resources/named.sh | 11 +- rgmanager/src/resources/openldap.sh | 12 +- rgmanager/src/resources/postgres-8.sh | 12 +- rgmanager/src/resources/samba.sh | 12 +- rgmanager/src/resources/smb.sh | 104 +--- rgmanager/src/resources/tomcat-5.sh | 12 +- rgmanager/src/resources/utils/config-utils.sh.in | 66 +-- rgmanager/src/resources/utils/messages.sh | 4 - rgmanager/src/resources/vm.sh | 30 + 129 files changed, 5659 insertions(+), 4191 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSJLUxAgUGcMLQ3qJAQKMCw//Ud5jm6xhZlrUJvAhB3JsnromDFEgJiwt KYFJ+pzmvfTvkw3q+SyJu8vBSvJ3tFVeu1/fIiFGtVJSiucROKl3ToDhjDUz1Y+4 OYvyMdPMHlw1GK92XnCA8cnKFlejnSMTvgSpfJkWWsOfp/MKB5zwrUBaSKAdutPV d7Y4nD8zEKhLWgZ76flrq5uPOvGTazU6Q3aNMJJIhyDkrLNSBOTEjIWBRtwtAAMq RX4mv0aQCgcRPat602BiAVb8+DVHmmxFkjmWjnARi8LypMOxxAEZX5g8dFFWPMC7 C5Quul6AhjAfbzWkOxINjk8aa/i7USqSkwmVkNnkifrcGFdH+Su3pDMzGAOpWSqO 4UPZF00rKqr8hH51BDufCtebieZ5qIyE2yBLpuQSqs5ZGk7oSaa0cog3QqUqhvDf d32QIbRZ/bR6ChJnQu2IHH8FNZGMscsnkPcNt2BzXVYsgQMJUJtWf44r3H2jCWoO bsjT1EDJIAgM3urYm09o/jURW8eckYlA5oH5xuQuydOYRr5EKW31W0LNP4PMfWSR WNBAs0U3vB0RI41v40IqyRWmNqoOIdkBJe59Kb9r5z0Z/AvbASVUES3FCjLv12tY Gn4CEqiL1ti7kGZpX73W+1ydvYO+ZQUvqP4bfqYNLwB1OPrsUXT6rG5wx2lWs+rn XAqCkmBqcKo= =IH1P -----END PGP SIGNATURE----- From balajisundar at midascomm.com Fri Aug 1 10:06:49 2008 From: balajisundar at midascomm.com (Balaji) Date: Fri, 01 Aug 2008 15:36:49 +0530 Subject: [Linux-cluster] HP ILO Fence Configuration Message-ID: <4892E039.3050701@midascomm.com> Dear All, Currently i am using HP x6600 Server and I have installed RHEL4 Update 4 AS Linux and RHEL4 Update 4 Support Cluster Suite in my server I am new in fence and can any one help me how to configure HP ILO fence in my server and HP ILO Fence Functionality Regards -S.Balaji From ajeet.singh.raina at logica.com Fri Aug 1 10:16:05 2008 From: ajeet.singh.raina at logica.com (Singh Raina, Ajeet) Date: Fri, 1 Aug 2008 15:46:05 +0530 Subject: [Linux-cluster] Directories gets Deleted during Failover Message-ID: <0139539A634FD04A99C9B8880AB70CB209B179E6@in-ex004.groupinfra.com> Hi, I have been busy setting up Two Node cluster Setup and find that during the failover the directories created under mount point gets deleted. Please do let me know why it is behaving so? ajeet This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Fri Aug 1 11:08:46 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Fri, 1 Aug 2008 13:08:46 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.03.06 released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its vibrant community are proud to announce the 7th release from the STABLE2 branch: 2.03.06. The STABLE2 branch collects, on a daily base, all bug fixes and the bare minimal changes required to run the cluster on top of the most recent Linux kernel (2.6.26) and rock solid openais (0.80.3). The 2.03.06 release features porting to the 2.6.26 kernel for the kernel modules and userland. Userland can also run in compatibility mode with 2.6.25 kernel. NOTE The stable2 branch will not build on top of corosync/openais new tree for this release. The very latest code from openais that can be used is svn r1579. Porting to corosync will happen in future. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.06.tar.gz https://fedorahosted.org/releases/c/l/cluster/cluster-2.03.06.tar.gz In order to use GFS1, the Linux kernel requires a minimal patch: ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch https://fedorahosted.org/releases/c/l/cluster/lockproto-exports.patch To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.03.05): Bob Peterson (15): Replace put_inode with drop_inode Print log header flags for gfs journals. Speed up userspace bitmap manipulation code. gfs_fsck crosswrite for block number sanity checking Fix some bad references to gfs_tool and gfs_fsck Deleted unused function print_map Shrink memory 1: eliminate b_size from pseudo-buffer-heads Shrink memory 2: get rid of 3 huge in-core bitmaps Shrink memory 3: smaller link counts in inode_info Better error reporting in gfs2_fsck RGRepair: Account for RG blocks inside journals gfs2_fsck dupl. blocks between EA and data gfs2_edit: Ability to enter "journalX" in block number. gfs2_edit: was parsing out gfs1 log descriptors improperly gfs2_edit: Improved gfs journal dumps Christine Caulfield (2): [CMAN] Add node votes to 'cman_tool status' output cman: revert dirty patch David Teigland (3): gfs_controld: read plocks from dlm or lock_dlm fenced: update cman only after complete success groupd: ignore nolock gfs Fabio M. Di Nitto (5): [GNBD] Update gnbd to work with 2.6.26 [GFS] Make gfs build with 2.6.26 (DO NOT USE!) [GFS] Fix comment [BUILD] Add install/uninstall snippets for documents [FENCE] Sync fence_apc_snmp from RHEL47 branch Lon Hohberger (1): [qdisk] Make stop_cman="1" work if heuristics fail during initialization Ryan McCabe (1): fence: update apc snmp agent Ryan O'Hara (2): gfs_mkfs: change the way we check to see if a device is mounted cman: add option to init script to prevent joining the fence domain cman/cman_tool/main.c | 1 + cman/daemon/commands.c | 3 +- cman/init.d/cman.in | 93 ++++-- cman/qdisk/main.c | 2 + fence/agents/apc_snmp/fence_apc_snmp.py | 581 ++++++++++++++++++------------- fence/fenced/agent.c | 16 +- gfs-kernel/src/gfs/ops_address.c | 2 +- gfs-kernel/src/gfs/ops_super.c | 7 +- gfs-kernel/src/gfs/quota.c | 4 +- gfs/gfs_mkfs/main.c | 29 +- gfs2/edit/hexedit.c | 290 ++++++++++++---- gfs2/edit/savemeta.c | 9 +- gfs2/fsck/eattr.c | 21 +- gfs2/fsck/eattr.h | 20 +- gfs2/fsck/fs_recovery.c | 4 +- gfs2/fsck/fsck.h | 5 +- gfs2/fsck/initialize.c | 10 +- gfs2/fsck/lost_n_found.c | 7 +- gfs2/fsck/main.c | 35 +-- gfs2/fsck/metawalk.c | 177 +++++++---- gfs2/fsck/metawalk.h | 16 +- gfs2/fsck/pass1.c | 405 ++++++++++++++-------- gfs2/fsck/pass1b.c | 95 +++--- gfs2/fsck/pass1c.c | 69 +++-- gfs2/fsck/pass2.c | 61 ++-- gfs2/fsck/pass3.c | 20 +- gfs2/fsck/pass4.c | 11 +- gfs2/fsck/pass5.c | 2 +- gfs2/fsck/rgrepair.c | 58 +++- gfs2/libgfs2/bitmap.c | 79 ++++- gfs2/libgfs2/block_list.c | 232 ++++++------- gfs2/libgfs2/buf.c | 1 - gfs2/libgfs2/fs_bits.c | 2 +- gfs2/libgfs2/fs_ops.c | 38 +- gfs2/libgfs2/libgfs2.h | 93 ++++-- gfs2/libgfs2/recovery.c | 2 +- gfs2/libgfs2/rgrp.c | 8 + gnbd-kernel/src/gnbd.c | 91 +++--- gnbd-kernel/src/gnbd.h | 4 +- group/daemon/main.c | 28 ++- group/gfs_controld/lock_dlm.h | 1 + group/gfs_controld/plock.c | 254 +++++++++++--- make/install.mk | 4 + make/uninstall.mk | 3 + 44 files changed, 1841 insertions(+), 1052 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSJLuxAgUGcMLQ3qJAQJu0xAApnUtjXaP72FlznIFBXHIyIvDWxozRf9u HNSAM7dO94Iu2nCUyuehFKNNzyL80s9U/LhrTZfokxwTqHLp3YGYAzcMJ2WmqDDp DiskzoofGbYp2BT3LBeZKuNeGi+eWoK4C6kfgKMTGpex/1CrSdT4lm/x9kya+zwR h24fl1kp74z+90gcU5aqkwb6GbDdmu9CLmUrufciHsaLAx6Cw96SU794BRpOBNiH zw1deZMHvnNQYlJmBF0icpHS3GbdKF/wNt2m3ux1fPcAsaDRbSLfkyqgxd3qaC8p fOGh1seQIW8iefh/2kJlSmcZ8D2SOnycdyXK7wLKUMOuXNjbxgLHjguXjqaKsg6V oxQaY6IWuczW47KOdti6A3SNU86obz74zc8D+7LXPbf3HC7TIvqvgCwl6RJ7ODSs 0sbgZ6QYZvNlN3hwGnuaE2dh5UgsL5foUgogJSgJ4alTp6RCXPwv8Lm9uGAtcT6l BMull8I/R+/SmLHi8bnXm/w/7HSCziT8CZhXIwXkBTkTkt7V4s30o8QJOAABDxp0 ehavfsjqX/ualz4CKFykEKi3CIbXvXqrxcYrncNd8UWcHrLNQHNbEQ0xsnmrvhgj zVjNWbPnfa/FEOjMjLZ1xqnSXXGpIzR7bjoOy2PUZ3THmhwq85nf9Eyo+56Dzgdi IkL0+pbpH4Q= =brA9 -----END PGP SIGNATURE----- From ozgurakan at gmail.com Fri Aug 1 13:33:59 2008 From: ozgurakan at gmail.com (Ozgur Akan) Date: Fri, 1 Aug 2008 09:33:59 -0400 Subject: [Linux-cluster] network for cluster communication Message-ID: <68f132770808010633t1d6421f2va9adaf388ac7480e@mail.gmail.com> Hi, I have two important questions regardin cluster performance. I attached two ethernet cards as second interfaces on two nodes that I have. - How can I configure cluster to use this new interface (network) to communicate between eachother.? - Is speed of this local network between two nodes an important criteria for file locks on GFS ? thanks, Ozgur Akan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Fri Aug 1 13:34:40 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Fri, 01 Aug 2008 08:34:40 -0500 Subject: [Linux-cluster] Directories gets Deleted during Failover In-Reply-To: <0139539A634FD04A99C9B8880AB70CB209B179E6@in-ex004.groupinfra.com> References: <0139539A634FD04A99C9B8880AB70CB209B179E6@in-ex004.groupinfra.com> Message-ID: <1217597680.9521.31.camel@technetium.msp.redhat.com> Hi Ajeet, On Fri, 2008-08-01 at 15:46 +0530, Singh Raina, Ajeet wrote: > Hi, > > I have been busy setting up Two Node cluster Setup and find that > during the failover the directories created under mount point gets > deleted. > > Please do let me know why it is behaving so? You haven't given us enough information. You haven't even said whether the file system is GFS, GFS2, EXT3, XFS, etc., or NFS over one of the above. In general, directories should not just disappear. Perhaps one of your nodes has the file system mounted and the other does not, so when failover occurs, it just looks like the directories are gone? Regards, Bob Peterson Red Hat Clustering & GFS From ccaulfie at redhat.com Fri Aug 1 13:38:38 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 01 Aug 2008 14:38:38 +0100 Subject: [Linux-cluster] network for cluster communication In-Reply-To: <68f132770808010633t1d6421f2va9adaf388ac7480e@mail.gmail.com> References: <68f132770808010633t1d6421f2va9adaf388ac7480e@mail.gmail.com> Message-ID: <489311DE.6050701@redhat.com> Ozgur Akan wrote: > Hi, > > I have two important questions regardin cluster performance. > > I attached two ethernet cards as second interfaces on two nodes that I > have. > > - How can I configure cluster to use this new interface (network) to > communicate between eachother.? Put the host name or IP address of the new interface in cluster.conf, in place of the existing host names. > - Is speed of this local network between two nodes an important criteria > for file locks on GFS ? > Yes, very :) Chrissie From lhh at redhat.com Fri Aug 1 19:34:02 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 01 Aug 2008 15:34:02 -0400 Subject: [Linux-cluster] "Inc" column description/semnification In-Reply-To: <200807311004.23788.linux@vfemail.net> References: <200807301452.41459.linux@vfemail.net> <1217439408.30587.195.camel@ayanami> <200807311004.23788.linux@vfemail.net> Message-ID: <1217619242.11524.214.camel@ayanami> On Thu, 2008-07-31 at 10:04 +0300, Alex wrote: > On Wednesday 30 July 2008 20:36, Lon Hohberger wrote: > > On Wed, 2008-07-30 at 14:52 +0300, Alex wrote: > > > Hello, > > > > > > What does it mean "Inc" column in the output of the cman_tool nodes > > > command? > > > > > > [root at rs2 ~]# cman_tool nodes > > > Node Sts Inc Joined Name > > > 1 M 8 2008-07-30 11:03:12 192.168.113.5 > > > 2 M 4 2008-07-30 10:59:34 192.168.113.4 > > > [root at rs2 ~]# > > > > > > Can anybody tell me what represent 4 and 8 in Inc coulmn? > > > > Local incarnation # for the node, if I recall correctly. They usually > > do not match cluster-wide. > > Because we know what is its name, let me ask you about Inc signification, how > can be interpreted and what represent 8 and 4 in above column... 8m, 8pps, > 8kbps, 8kv, womans, mans, aliens? In manual and documentation is absolutely > missing any info about Inc column! I'm pretty sure it's the Totem protocol sequence # the local node recorded for when it first "saw" the node. The "Joined" time is the same thing, except it's according to the local node's clock instead of the Totem token sequence #. That's all they are. They don't indicate anything useful for monitoring. > And another question: why numbers in Inc column is changing everytime a node > is rebooted and remain constant till next reboot? The sequence # is different the next time the node is "seen". You'll also notice the "Joined" value is different. The "Inc" column and "Joined" column are set at the same time but are not related to each other value-wise. -- Lon From lhh at redhat.com Fri Aug 1 19:38:18 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 01 Aug 2008 15:38:18 -0400 Subject: [Linux-cluster] how to mount a gfs2 volume on all our real webservers in /var/www/html In-Reply-To: <200807311322.46231.linux@vfemail.net> References: <200807311143.22407.linux@vfemail.net> <48917D66.7050801@aokaifh.cn> <200807311322.46231.linux@vfemail.net> Message-ID: <1217619498.11524.220.camel@ayanami> On Thu, 2008-07-31 at 13:22 +0300, Alex wrote: > On Thursday 31 July 2008 11:52, ??? wrote: > > This is a typical LVS model. > > Indeed is a LVS. I have an router in front of rs1, rs2, rs3 webservers which > is configured as LVS with load balancing. > > > Do not add your httpd script and mount script into source in your > > cluster.conf > > In redhat howto "Example of Setting Up Apache HTTP Server" they are saying to > not start httpd server at boot time and leave the cluster to do that! Thats > why i added http_service in my cluster.conf. It's a different use case than what you want. The one in the documentation you were reading is referring to failover of a single instance of httpd, not running httpd on 3 nodes at the same time. * put your gfs2 volumes in /etc/fstab * turn on httpd -- Lon From lhh at redhat.com Fri Aug 1 19:40:28 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 01 Aug 2008 15:40:28 -0400 Subject: [Linux-cluster] 2 questions regarding gfs and gfs2 In-Reply-To: <200807311158.29161.linux@vfemail.net> References: <200807311158.29161.linux@vfemail.net> Message-ID: <1217619628.11524.224.camel@ayanami> On Thu, 2008-07-31 at 11:58 +0300, Alex wrote: > Hello, > > Using conga, to generate cluster.conf file i saw by default, when is choosen > GFS File system, in cluster.conf file is generated fsid="35790" and > fstype="gfs" vor a gfs volume. > > [snip from my cluster.conf] > clusterfs device="/dev/myvg1/mylv1" force_unmount="0" > fsid="35790" fstype="gfs" mountpoint="/var/www/html > > With this config, mylv1 has failed to mount because /dev/myvg1/mylv1 is gfs2 > formatted. In this case, I changed manually in cluster.conf > fstype="gfs2" (leaving unchanged fsid="35790"), and now mylv1 is mounted > without problem. > > Questions: > - GFS2 has the same fsid as GFS? If not, which value is correct? fsid is not related to file system types, it's for preserving NFS client file handles in the event of a server-side failover when devices do not match up. > - On centos-5.2, i saw that by default is used GFS2, which many peoples says > that is not good for production use. Is this true or in centos/rhel-5.2 this > has been changed and GFS2 is enough mature to be considered "production > quality"? No, it's not yet production quality. -- Lon > > Regards, > Alx > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lhh at redhat.com Fri Aug 1 19:48:15 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 01 Aug 2008 15:48:15 -0400 Subject: [Linux-cluster] Directories gets Deleted during Failover In-Reply-To: <0139539A634FD04A99C9B8880AB70CB209B179E6@in-ex004.groupinfra.com> References: <0139539A634FD04A99C9B8880AB70CB209B179E6@in-ex004.groupinfra.com> Message-ID: <1217620095.11524.228.camel@ayanami> On Fri, 2008-08-01 at 15:46 +0530, Singh Raina, Ajeet wrote: > Hi, > > I have been busy setting up Two Node cluster Setup and find that > during the failover the directories created under mount point gets > deleted. > > Please do let me know why it is behaving so? Cluster.conf! Cluster.conf! -- Lon From j.buzzard at dundee.ac.uk Sat Aug 2 22:52:14 2008 From: j.buzzard at dundee.ac.uk (Jonathan Buzzard) Date: Sat, 02 Aug 2008 23:52:14 +0100 Subject: [Linux-cluster] Fencing using iDRAC/ Dell M600 In-Reply-To: References: <824ffea00807291305w4c542f2fr764ae54a29585897@mail.gmail.com> Message-ID: <4894E51E.2050205@dundee.ac.uk> David J Craigon wrote: > Are you sure you are using an actual M600 blade chassis? On the ones > I've got, they speak a different language after the telnet from other > DRAC cards, hence the problem. > Indeed, they are SMASH-CLP http://publib.boulder.ibm.com/infocenter/toolsctr/v1r0/index.jsp?topic=/com.ibm.smash1_3.doc/smash_t_usingclp.html As far as I can make out it is designed to be a vendor neutral out of band management processor interface. So a DRAC, ILO, LOM, etc. all look the same. I guess in about 10 years when everything in the data centre has such an interface it will make life simpler in multi vendor environments. It is full of XML goodness if that sort of stuff is your cup of tea, and is supposed to be easier to script up. You can get it on a standard DRAC5 by issuing a smclp command after login. All that said it is the most tortuous pile of dino droppings I have had the misfortune to use. Not helped by a lack of documentation. Looks like it came right out of the same committee that dreamt up ACPI. JAB. -- Jonathan A. Buzzard Tel: +441382-386998 Storage Administrator, College of Life Sciences University of Dundee, DD1 5EH From brettcave at gmail.com Mon Aug 4 08:11:26 2008 From: brettcave at gmail.com (Brett Cave) Date: Mon, 4 Aug 2008 10:11:26 +0200 Subject: [Linux-cluster] HP ILO Fence Configuration In-Reply-To: <4892E039.3050701@midascomm.com> References: <4892E039.3050701@midascomm.com> Message-ID: On Fri, Aug 1, 2008 at 12:06 PM, Balaji wrote: > Dear All, > > Currently i am using HP x6600 Server and I have installed RHEL4 Update 4 AS > Linux and > RHEL4 Update 4 Support Cluster Suite in my server > I am new in fence and can any one help me how to configure HP ILO fence in > my server > and HP ILO Fence Functionality I have just set it up, have not tested 100%, but what I have so far is: 1) create fence usernames and passwords ILO on each of your devices. 2) Update cluster.conf as follows: According to the docs, that SHOULD work, I am still having hanging issues on access to certain files / directories on GFS, but still pretty new to it, so not 100% sure whether its related to fencing or not. > Regards > -S.Balaji > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From brettcave at gmail.com Mon Aug 4 09:02:51 2008 From: brettcave at gmail.com (Brett Cave) Date: Mon, 4 Aug 2008 11:02:51 +0200 Subject: [Linux-cluster] How to determine what is causing GFS to hang? Message-ID: Hi, I have a GFS cluster set up on a fibre SAN. Selected output from cman_tool status: Membership state: Cluster-Member Nodes: 6 Expected votes: 11 Total votes: 11 Quorum: 6 Active subsystems: 7 Flags: cman_tool nodes (0 = qdisk): Node Sts Inc Joined Name 0 M 0 2008-07-25 03:00:29 /dev/sda1 1 M 1156 2008-07-25 02:59:16 worker1 2 M 1160 2008-07-25 02:59:20 worker2 # and so on, all sts columns = M, all have valid Joined time, all have different Inc column. cman_tool services - think there might be something here, not sure what to make of this - is this fencing trying to take place?? [root at hecate ~]# cman_tool services type level name id state fence 0 default 00010001 none [1 2 3 4 5 6] dlm 1 storage 00030001 none [1 2 3 4 5 6] dlm 1 cache1 00050001 none [1 2 3 4 5 6] gfs 2 storage 00020001 none [1 2 3 4 5 6] gfs 2 cache1 00040001 none [1 2 3 4 5 6] cache1 and storage are the 2 GFS volumes in the cluster. when I run an "ls" on a directory in storage, it just hangs. How would I get GFS to recover from this? Regards. Brett From ben.yarwood at juno.co.uk Mon Aug 4 11:43:03 2008 From: ben.yarwood at juno.co.uk (Ben Yarwood) Date: Mon, 4 Aug 2008 12:43:03 +0100 Subject: [Linux-cluster] GFS Mounting Issues In-Reply-To: <474534909BE4064E853161350C47578E0BABF8EE@ncrmail1.corp.navcan.ca> References: <474534909BE4064E853161350C47578E0BABF8EE@ncrmail1.corp.navcan.ca> Message-ID: <047101c8f627$410cbb50$c32631f0$@yarwood@juno.co.uk> I pretty sure you need to be running fenced and clvmd as well to get this to work, there was a message relating to this in your original post. /sbin/mount.gfs: node not a member of the default fence domain /sbin/mount.gfs: error mounting lockproto lock_dlm You should see something like this in the output from cman_tool services. type level name id state fence 0 default 00010001 none [1 2] dlm 1 clvmd 00020001 none [1 2] dlm 1 rgmanager 00030001 none [1 2] The fence domain will need to be configured correctly in your cluster.conf file and I believe will start automatically when you start cman. There will probably be some errors in your log stating the fence domain couldn't start up when you started cman. Ben > -----Original Message----- > From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Caron, > Chris > Sent: 31 July 2008 18:30 > To: linux clustering > Subject: RE: [Linux-cluster] GFS Mounting Issues > > Bob, > > Thank you for replying; I should have included more information. I was > going by the bases people assumed a valid cluster was running (but we > should never assume that right? :) ). After your email I ran a few > status tools to report more information in hopes may have helped guide > anyone to an answer. Had you not sent your email, I wouldn't have > uncovered the very odd one at the bottom of this email. > > [root at node01 ~]# service cman status > cman is running. > > [root at node01 ~]# clustat > Cluster Status for rhc1 @ Thu Jul 31 13:21:35 2008 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > node01.rhc1 1 Online, Local > node02.rhc1 2 Online > node03.rhc1 3 Online > node04.rhc1 4 Online > node05.rhc1 5 Offline > > (Note: I tailored the above output so it wouldn't wrap) > > [root at node01 ~]# service rgmanager status > clurgmgrd (pid 13235) is running... > > [root at node01 ~]# cman_tool status > Version: 6.1.0 > Config Version: 8 > Cluster Name: rhc1 > Cluster Id: 1575 > Cluster Member: Yes > Cluster Generation: 36 > Membership state: Cluster-Member > Nodes: 4 > Expected votes: 5 > Total votes: 4 > Quorum: 3 > Active subsystems: 8 > Flags: Dirty > Ports Bound: 0 177 > Node name: node01.rhc1 > Node ID: 1 > Multicast addresses: > Node addresses: > > This one concerns me : > [root at node01 ~]# cman_tool services > type level name id state > dlm 1 rgmanager 00010002 FAIL_ALL_STOPPED > [1 2 3] > > Chris Caron > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From pbruna at it-linux.cl Mon Aug 4 17:30:04 2008 From: pbruna at it-linux.cl (Patricio A. Bruna) Date: Mon, 4 Aug 2008 13:30:04 -0400 (CLT) Subject: [Linux-cluster] GFS and Directory with lots of small files Message-ID: <23892998.97721217871004822.JavaMail.root@lisa.itlinux.cl> Hi, I had found on the list that i can improve the performance of GFS with small files if i adapt the size of the rsbtbl_size/lkbtbl_size values. But it also found that this has to be done after loading the dlm module, but before the lockspace is created. What means "before the lockspace is created", before the GFS partitions are mounted? How do i do this? PD: I send this same email to antoher list by mistake. ------------------------------------ Patricio Bruna V. IT Linux Ltda. http://www.it-linux.cl Fono : (+56-2) 333 0578 - Chile Fono: (+54-11) 6632 2760 - Argentina M?vil : (+56-09) 8827 0342 -------------- next part -------------- An HTML attachment was scrubbed... URL: From grimme at atix.de Mon Aug 4 20:57:07 2008 From: grimme at atix.de (Marc Grimme) Date: Mon, 4 Aug 2008 22:57:07 +0200 Subject: [Linux-cluster] GFS and Directory with lots of small files In-Reply-To: <23892998.97721217871004822.JavaMail.root@lisa.itlinux.cl> References: <23892998.97721217871004822.JavaMail.root@lisa.itlinux.cl> Message-ID: <200808042257.07193.grimme@atix.de> On Monday 04 August 2008 19:30:04 Patricio A. Bruna wrote: > Hi, > I had found on the list that i can improve the performance of GFS with > small files if i adapt the size of the rsbtbl_size/lkbtbl_size values. But > it also found that this has to be done after loading the dlm module, but > before the lockspace is created. What means "before the lockspace is > created", before the GFS partitions are mounted? > > How do i do this? Umount gfs fs. Add changes to the proc-fs in a resource skript that is startet before gfs is mounted and apply it. Then remount the gfs and there you go. The lockspaces will get created for every filesystem when the filesystem is mounted. -marc. > > PD: I send this same email to antoher list by mistake. > > ------------------------------------ > Patricio Bruna V. > IT Linux Ltda. > http://www.it-linux.cl > Fono : (+56-2) 333 0578 - Chile > Fono: (+54-11) 6632 2760 - Argentina > M?vil : (+56-09) 8827 0342 -- Gruss / Regards, Marc Grimme Phone: +49-89 452 3538-14 http://www.atix.de/ http://www.open-sharedroot.org/ ** ATIX Informationstechnologie und Consulting AG Einsteinstr. 10 85716 Unterschleissheim Deutschland/Germany Phone: +49-89 452 3538-0 Fax: +49-89 990 1766-0 Registergericht: Amtsgericht Muenchen Registernummer: HRB 168930 USt.-Id.: DE209485962 Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) Vorsitzender des Aufsichtsrats: Dr. Martin Buss From schlegel at riege.com Mon Aug 4 22:16:56 2008 From: schlegel at riege.com (Gunther Schlegel) Date: Tue, 05 Aug 2008 00:16:56 +0200 Subject: [Linux-cluster] How to fence a virtual machine in a virtual cluster? In-Reply-To: <200808042257.07193.grimme@atix.de> References: <23892998.97721217871004822.JavaMail.root@lisa.itlinux.cl> <200808042257.07193.grimme@atix.de> Message-ID: <48977FD8.7000508@riege.com> Hi, I am running RHEL Virtual Machines as cluster services on RedHat 5.2 Dom0 nodes. The virtual machines use clustered logical volumes for storage, /etc/xen is located on a gfs filesystem. Cluster management using luci from a dedicated admin server. (The entire system works quite well. Some load balancing mechanism on the Dom0 nodes would be fine, but that is another issue...) Now I need a second cluster, in fact I need a gfs filesystem shared among some of the virtual machines. In general this should not be an issue, but how can I fence a virtual machine inside of a virtual cluster? Technically 'virsh destroy' on the Dom0 host will do the job. Though: a) I cannot define a script for fencing (at least using luci). b) There is a fencing method for virtual machines in the RHEL 5.2 cluster, but it is only meant to fence virtual nodes that are part of a mixed cluster of physical and virtual nodes. c) Inside a virtual cluster the "Virtual Machine Fencing" is of no use, because the virtual machine itself is a service in *another* cluster. One would need an option to define the Dom0 host or cluster. I somehow object to mix physical with virtual machines inside of a cluster (and I do not want to take virtual machines part in the quorum of the physical machines. Hypothetically the virtual machines may fence the physical nodes, thereby shutting down the entire cluster...) . The Dom0-cluster is intended to run VMs only and no other services. The VMs are to provide services, and if they need cluster services, I prefer to define aditional clusters. Am I missing something? In fact the ability to define a script for fencing would be sufficient from my point of view. Or is the only real solution to join the VMs in the Dom0 cluster and assign a dedicated failover group to them? any hint is highly appreciated. best regards, Gunther -- ............................................................. Riege Software International GmbH Fon: +49 (2159) 9148 0 Mollsfeld 10 Fax: +49 (2159) 9148 11 40670 Meerbusch Web: www.riege.com Germany E-Mail: schlegel at riege.com --- --- Handelsregister: Managing Directors: Amtsgericht Neuss HRB-NR 4207 Christian Riege USt-ID-Nr.: DE120585842 Gabriele Riege Johannes Riege ............................................................. YOU CARE FOR FREIGHT, WE CARE FOR YOU -------------- next part -------------- A non-text attachment was scrubbed... Name: schlegel.vcf Type: text/x-vcard Size: 344 bytes Desc: not available URL: From Bevan.Broun at ardec.com.au Mon Aug 4 23:11:49 2008 From: Bevan.Broun at ardec.com.au (Bevan Broun) Date: Tue, 5 Aug 2008 09:11:49 +1000 Subject: [Linux-cluster] luci cant perform cluster actions. How to debug? In-Reply-To: <48977FD8.7000508@riege.com> Message-ID: <6008E5CED89FD44A86D3C376519E1DB2010ACB4260@megatron.ms.a2end.com> Hi All I have cluster where luci/ricci seems partially broken. Luci reports the cluster members and cluster state but it will not have a node leave or shutdown the cluster. Performing shutdown/startup of cluster via the init scripts all works. We had changed the node names manually in cluster.conf from fully qualified host names to IP addresses (with change added to cman init script to make this work). That's about the only thing I can think that may have caused the issue. Would a non functioning DNS system cause issues? How can I go about debugging where the issue is? Thanks Bevan Broun Solutions Architect Ardec International http://www.ardec.com.au http://www.lisasoft.com http://www.terrapages.com Sydney ----------------------- Suite 112,The Lower Deck 19-21 Jones Bay Wharf Pirrama Road, Pyrmont 2009 Ph: +61 2 8570 5000 Fax: +61 2 8570 5099 From satoru.satoh at gmail.com Tue Aug 5 04:02:03 2008 From: satoru.satoh at gmail.com (Satoru SATOH) Date: Tue, 5 Aug 2008 13:02:03 +0900 Subject: [Linux-cluster] [PATCH] Add network interface select option for fence_xvmd Message-ID: <20080805040202.GA14134@gescom.nrt.redhat.com> Hello, # I sent this before but it looks disappered somewhere so that resend it # again. Excuse me if you received the same mail twice. It should be useful that fence_xvmd listen on a certain network interface which manually specified under some conditions such as a system has multiple network interfaces and the one to default route is not prefered choice, I think. The following patch adds the option "-I " to select network interface fence_xvmd to listen on. - satoru fence/agents/xvm/fence_xvmd.c | 8 ++++---- fence/agents/xvm/mcast.c | 21 ++++++++++++++++++--- fence/agents/xvm/mcast.h | 4 ++-- fence/agents/xvm/options.c | 13 +++++++++++++ fence/agents/xvm/options.h | 1 + fence/man/fence_xvmd.8 | 3 +++ 6 files changed, 41 insertions(+), 9 deletions(-) diff --git a/fence/agents/xvm/fence_xvmd.c b/fence/agents/xvm/fence_xvmd.c index 888f24b..1dc5eba 100644 --- a/fence/agents/xvm/fence_xvmd.c +++ b/fence/agents/xvm/fence_xvmd.c @@ -921,7 +921,7 @@ main(int argc, char **argv) unsigned int logmode = 0; char key[MAX_KEY_LEN]; int key_len = 0, x; - char *my_options = "dfi:a:p:C:U:c:k:u?hLXV"; + char *my_options = "dfi:a:I:p:C:U:c:k:u?hLXV"; cman_handle_t ch = NULL; void *h = NULL; @@ -1031,9 +1031,9 @@ main(int argc, char **argv) } if (args.family == PF_INET) - mc_sock = ipv4_recv_sk(args.addr, args.port); + mc_sock = ipv4_recv_sk(args.addr, args.port, args.ifindex); else - mc_sock = ipv6_recv_sk(args.addr, args.port); + mc_sock = ipv6_recv_sk(args.addr, args.port, args.ifindex); if (mc_sock < 0) { log_printf(LOG_ERR, "Could not set up multicast listen socket\n"); @@ -1049,5 +1049,5 @@ main(int argc, char **argv) //malloc_dump_table(); - return 0; + exit(errno); } diff --git a/fence/agents/xvm/mcast.c b/fence/agents/xvm/mcast.c index db46328..001e3ac 100644 --- a/fence/agents/xvm/mcast.c +++ b/fence/agents/xvm/mcast.c @@ -31,11 +31,12 @@ LOGSYS_DECLARE_SUBSYS ("XVM", SYSLOGLEVEL); Sets up a multicast receive socket */ int -ipv4_recv_sk(char *addr, int port) +ipv4_recv_sk(char *addr, int port, unsigned int ifindex) { int sock; struct ip_mreq mreq; struct sockaddr_in sin; + struct ifreq ifreq; /* Store multicast address */ if (inet_pton(PF_INET, addr, @@ -74,7 +75,20 @@ ipv4_recv_sk(char *addr, int port) * Join multicast group */ /* mreq.imr_multiaddr.s_addr is set above */ - mreq.imr_interface.s_addr = htonl(INADDR_ANY); + if (ifindex > 0 && if_indextoname(ifindex, ifreq.ifr_name) != NULL) { + ifreq.ifr_addr.sa_family = AF_INET; + if (ioctl(sock, SIOCGIFADDR, &ifreq) < 0) { + printf("Failed to get address of the interface %d\n", + ifindex); + mreq.imr_interface.s_addr = htonl(INADDR_ANY); + } else { + memcpy(&mreq.imr_interface, + &((struct sockaddr_in *) &ifreq.ifr_addr)->sin_addr, + sizeof(struct in_addr)); + } + } else { + mreq.imr_interface.s_addr = htonl(INADDR_ANY); + } dbg_printf(4, "Joining multicast group\n"); if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq)) == -1) { @@ -184,7 +198,7 @@ ipv4_send_sk(char *send_addr, char *addr, int port, struct sockaddr *tgt, Sets up a multicast receive (ipv6) socket */ int -ipv6_recv_sk(char *addr, int port) +ipv6_recv_sk(char *addr, int port, unsigned int ifindex) { int sock, val; struct ipv6_mreq mreq; @@ -203,6 +217,7 @@ ipv6_recv_sk(char *addr, int port) memcpy(&mreq.ipv6mr_multiaddr, &sin.sin6_addr, sizeof(struct in6_addr)); + mreq.ipv6mr_interface = (ifindex > 0) ? ifindex : 0; /******************************** * SET UP MULTICAST RECV SOCKET * diff --git a/fence/agents/xvm/mcast.h b/fence/agents/xvm/mcast.h index 5113f04..08fd6de 100644 --- a/fence/agents/xvm/mcast.h +++ b/fence/agents/xvm/mcast.h @@ -4,11 +4,11 @@ #define IPV4_MCAST_DEFAULT "225.0.0.12" #define IPV6_MCAST_DEFAULT "ff05::3:1" -int ipv4_recv_sk(char *addr, int port); +int ipv4_recv_sk(char *addr, int port, unsigned int ifindex); int ipv4_send_sk(char *src_addr, char *addr, int port, struct sockaddr *src, socklen_t slen, int ttl); -int ipv6_recv_sk(char *addr, int port); +int ipv6_recv_sk(char *addr, int port, unsigned int ifindex); int ipv6_send_sk(char *src_addr, char *addr, int port, struct sockaddr *src, socklen_t slen, int ttl); diff --git a/fence/agents/xvm/options.c b/fence/agents/xvm/options.c index 969ca8d..519f57e 100644 --- a/fence/agents/xvm/options.c +++ b/fence/agents/xvm/options.c @@ -82,6 +82,13 @@ assign_address(fence_xvm_args_t *args, struct arg_info *arg, char *value) static inline void +assign_interface(fence_xvm_args_t *args, struct arg_info *arg, char *value) +{ + args->ifindex = if_nametoindex(value); +} + + +static inline void assign_ttl(fence_xvm_args_t *args, struct arg_info *arg, char *value) { int ttl; @@ -299,6 +306,10 @@ static struct arg_info _arg_info[] = { "Multicast address (default=225.0.0.12 / ff02::3:1)", assign_address }, + { 'I', "-I ", NULL, + "Network interface to listen on (default=auto; kernel selects)", + assign_interface }, + { 'T', "-T ", "multicast_ttl", "Multicast time-to-live (in hops; default=2)", assign_ttl }, @@ -422,6 +433,7 @@ args_init(fence_xvm_args_t *args) args->flags = 0; args->debug = 0; args->ttl = DEFAULT_TTL; + args->ifindex = 0; } @@ -439,6 +451,7 @@ args_print(fence_xvm_args_t *args) { dbg_printf(1, "-- args @ %p --\n", args); _pr_str(args->addr); + _pr_int(args->ifindex); _pr_str(args->domain); _pr_str(args->key_file); _pr_int(args->op); diff --git a/fence/agents/xvm/options.h b/fence/agents/xvm/options.h index 7a2dcca..8720366 100644 --- a/fence/agents/xvm/options.h +++ b/fence/agents/xvm/options.h @@ -29,6 +29,7 @@ typedef struct { arg_flags_t flags; int debug; int ttl; + unsigned int ifindex; } fence_xvm_args_t; /* Private structure for commandline / stdin fencing args */ diff --git a/fence/man/fence_xvmd.8 b/fence/man/fence_xvmd.8 index 5a47211..05d4720 100644 --- a/fence/man/fence_xvmd.8 +++ b/fence/man/fence_xvmd.8 @@ -36,6 +36,9 @@ IP family to use (auto, ipv4, or ipv6; default = auto) Multicast address to listen on (default=225.0.0.12 for ipv4, ff02::3:1 for ipv6) .TP +\fB-I\fP \fIinterface\fP +Network interface to use; e.g. eth0 (default: one[s] kernel choosed) +.TP \fB-p\fP \fIport\fP Port to use (default=1229) .TP From satoru.satoh at gmail.com Tue Aug 5 05:06:30 2008 From: satoru.satoh at gmail.com (Satoru SATOH) Date: Tue, 5 Aug 2008 14:06:30 +0900 Subject: [Linux-cluster] [PATCH] trivial debug print fix for fence/agents/xvm/ Message-ID: <20080805050628.GC14134@gescom.nrt.redhat.com> Hello, Here is a trivial patch to add missing line breaks in some debug print lines for fence_xvm*. - satoru fence/agents/xvm/simple_auth.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fence/agents/xvm/simple_auth.c b/fence/agents/xvm/simple_auth.c index 07bc261..a1f04f4 100644 --- a/fence/agents/xvm/simple_auth.c +++ b/fence/agents/xvm/simple_auth.c @@ -381,7 +381,7 @@ read_key_file(char *file, char *key, size_t max_len) } if (nread == 0) { - dbg_printf(3, "Stopped reading @ %d bytes", + dbg_printf(3, "Stopped reading @ %d bytes\n", (int)max_len-remain); break; } @@ -391,7 +391,7 @@ read_key_file(char *file, char *key, size_t max_len) } close(fd); - dbg_printf(3, "Actual key length = %d bytes", (int)max_len-remain); + dbg_printf(3, "Actual key length = %d bytes\n", (int)max_len-remain); return (int)(max_len - remain); } From pedroche5 at gmail.com Tue Aug 5 09:44:47 2008 From: pedroche5 at gmail.com (Pedro Gonzalez Zamora) Date: Tue, 5 Aug 2008 11:44:47 +0200 Subject: [Linux-cluster] How can I re-assign cluster id Message-ID: <47311dd20808050244w1de4d3c4i4e4cb14f6ba2bde5@mail.gmail.com> Dear all I have two clusters each cluster has two nodes, the first cluster1 starts ok but de second cluster2 can't start because it gets the same cluster ID that cluster1 and I don't know why?? I have set diferent cluster name in cluster.conf. Best Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From ccaulfie at redhat.com Tue Aug 5 10:02:43 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 05 Aug 2008 11:02:43 +0100 Subject: [Linux-cluster] How can I re-assign cluster id In-Reply-To: <47311dd20808050244w1de4d3c4i4e4cb14f6ba2bde5@mail.gmail.com> References: <47311dd20808050244w1de4d3c4i4e4cb14f6ba2bde5@mail.gmail.com> Message-ID: <48982543.6030607@redhat.com> Pedro Gonzalez Zamora wrote: > Dear all > > > I have two clusters each cluster has two nodes, the first cluster1 > starts ok but de second cluster2 can't start because it gets the same > cluster ID that cluster1 and I don't know why?? > I have set diferent cluster name in cluster.conf. > It's probably that you've hit a weakness with the cluster name hash, it's not perfect by any means. Your options are to change one of the cluster names so that they hash to different values or (easier) add > the DRAC/MC). This is like a mix of the Dell DRAC/MC and DRAC 5 in >> fence_drac. >> >> I've written a patch that adds support for the CMC to fence_drac. This >> is my first patch ever using git, so hopefully it's good for you. >> >> This has been tested on a CMC, but it also changes the code for a Dell >> 1950. I'm going to get a 1950 and test it tomorrow. >> >> Feedback welcomed! > THANK YOU. SINCERELY. Please update us with test results. If no > regressions pop up, this is going into the agent ASAP. > > THANK YOU. > > :) > > -Jim, who often feels fenced in > >> >> David >> >> --- >> fence/agents/drac/fence_drac.pl | 36 +++++++++++++++++++++++++++++------- >> 1 files changed, 29 insertions(+), 7 deletions(-) >> >> diff --git a/fence/agents/drac/fence_drac.pl b/fence/agents/drac/fence_drac.pl >> index f199814..f96ef22 100644 >> --- a/fence/agents/drac/fence_drac.pl >> +++ b/fence/agents/drac/fence_drac.pl >> @@ -38,6 +38,7 @@ my $DRAC_VERSION_MC = 'DRAC/MC'; >> my $DRAC_VERSION_4I = 'DRAC 4/I'; >> my $DRAC_VERSION_4P = 'DRAC 4/P'; >> my $DRAC_VERSION_5 = 'DRAC 5'; >> +my $DRAC_VERSION_CMC = 'CMC'; >> >> my $PWR_CMD_SUCCESS = "/^OK/"; >> my $PWR_CMD_SUCCESS_DRAC5 = "/^Server power operation successful$/"; >> @@ -192,10 +193,15 @@ sub login >> # DRAC5 prints version controller version info >> # only after you've logged in. >> if ($drac_version eq $DRAC_VERSION_UNKNOWN) { >> - if ($t->waitfor(Match => "/.*\($DRAC_VERSION_5\)/m")) { >> + >> + if (my ($prematch,$match)=$t->waitfor(Match => >> "/.*(\($DRAC_VERSION_5\)|$DRAC_VERSION_CMC)/m")) { >> + if ($match=~/$DRAC_VERSION_CMC/) { >> + $drac_version = $DRAC_VERSION_CMC; >> + } else { >> $drac_version = $DRAC_VERSION_5; >> + } >> $cmd_prompt = "/\\\$ /"; >> - $PWR_CMD_SUCCESS = $PWR_CMD_SUCCESS_DRAC5; >> + $PWR_CMD_SUCCESS = $PWR_CMD_SUCCESS_DRAC5; >> } else { >> print "WARNING: unable to detect DRAC version '$_'\n"; >> } >> @@ -228,8 +234,10 @@ sub set_power_status >> } >> elsif ($drac_version eq $DRAC_VERSION_5) { >> $cmd = "racadm serveraction $svr_action"; >> - } else >> - { >> + } >> + elsif ($drac_version eq $DRAC_VERSION_CMC) { >> + $cmd = "racadm serveraction -m $modulename $svr_action"; >> + } else { >> $cmd = "serveraction -d 0 $svr_action"; >> } >> >> @@ -271,6 +279,11 @@ sub set_power_status >> } >> } >> fail "failed: unexpected response: '$err'" if defined $err; >> + >> + # on M600 blade systems, after power on or power off, status takes a >> couple of seconds to report correctly. Wait here before checking >> status again >> + sleep 5; >> + >> + >> } >> >> >> @@ -285,6 +298,8 @@ sub get_power_status >> >> if ($drac_version eq $DRAC_VERSION_5) { >> $cmd = "racadm serveraction powerstatus"; >> + } elsif ($drac_version eq $DRAC_VERSION_CMC) { >> + $cmd = "racadm serveraction powerstatus -m $modulename"; >> } else { >> $cmd = "getmodinfo"; >> } >> @@ -306,7 +321,7 @@ sub get_power_status >> >> fail "failed: unkown dialog exception: '$_'" unless (/^$cmd$/); >> >> - if ($drac_version ne $DRAC_VERSION_5) { >> + if ($drac_version ne $DRAC_VERSION_5 && $drac_version ne $DRAC_VERSION_CMC) { >> #Expect: >> # # >> # 1 ----> chassis Present ON Normal CQXYV61 >> @@ -335,6 +350,11 @@ sub get_power_status >> if(m/^Server power status: (\w+)/) { >> $status = lc($1); >> } >> + } >> + elsif ($drac_version eq $DRAC_VERSION_CMC) { >> + if(m/^(\w+)/) { >> + $status = lc($1); >> + } >> } else { >> my ($group,$arrow,$module,$presence,$pwrstate,$health, >> $svctag,$junk) = split /\s+/; >> @@ -364,7 +384,8 @@ sub get_power_status >> } >> >> $_=$status; >> - if(/^(on|off)$/i) >> + >> + if (/^(on|off)$/i) >> { >> # valid power states >> } >> @@ -440,6 +461,7 @@ sub do_action >> } >> >> set_power_status on; >> + >> fail "failed: $_" unless wait_power_status on; >> >> msg "success: powered on"; >> @@ -641,7 +663,7 @@ if ($drac_version eq $DRAC_VERSION_III_XT) >> fail "failed: option 'modulename' not compatilble with DRAC version >> '$drac_version'" >> if defined $modulename; >> } >> -elsif ($drac_version eq $DRAC_VERSION_MC) >> +elsif ($drac_version eq $DRAC_VERSION_MC || $drac_version eq $DRAC_VERSION_CMC) >> { >> fail "failed: option 'modulename' required for DRAC version '$drac_version'" >> unless defined $modulename; >> -- >> 1.5.5.1 >> >> >> >From 2899ae4468a69b89346cafba13022a9b214404f2 Mon Sep 17 00:00:00 2001 >> From: David J Craigon >> Date: Wed, 30 Jul 2008 16:34:24 +0100 >> Subject: Add a comment to state the CMC version this script works on >> >> --- >> fence/agents/drac/fence_drac.pl | 1 + >> 1 files changed, 1 insertions(+), 0 deletions(-) >> >> diff --git a/fence/agents/drac/fence_drac.pl b/fence/agents/drac/fence_drac.pl >> index f96ef22..11cc771 100644 >> --- a/fence/agents/drac/fence_drac.pl >> +++ b/fence/agents/drac/fence_drac.pl >> @@ -13,6 +13,7 @@ >> # PowerEdge 1850 DRAC 4/I 1.35 (Build 09.27) >> # PowerEdge 1850 DRAC 4/I 1.40 (Build 08.24) >> # PowerEdge 1950 DRAC 5 1.0 (Build 06.05.12) >> +# PowerEdge M600 CMC 1.01.A05.200803072107 >> # >> >> use Getopt::Std; > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From pedroche5 at gmail.com Tue Aug 5 11:03:31 2008 From: pedroche5 at gmail.com (Pedro Gonzalez Zamora) Date: Tue, 5 Aug 2008 13:03:31 +0200 Subject: [Linux-cluster] How can I re-assign cluster id In-Reply-To: <48982543.6030607@redhat.com> References: <47311dd20808050244w1de4d3c4i4e4cb14f6ba2bde5@mail.gmail.com> <48982543.6030607@redhat.com> Message-ID: <47311dd20808050403h7c9ef563re09178bbc47a6eb5@mail.gmail.com> Dear Christine I have set and I trying again but I get this error: cman: unable to set cluster_id Could you tell me please more about cluster name hash, how it works and how can I change the values? Best Regards 2008/8/5 Christine Caulfield > Pedro Gonzalez Zamora wrote: > >> Dear all >> >> >> I have two clusters each cluster has two nodes, the first cluster1 starts >> ok but de second cluster2 can't start because it gets the same cluster ID >> that cluster1 and I don't know why?? >> I have set diferent cluster name in cluster.conf. >> >> > It's probably that you've hit a weakness with the cluster name hash, it's > not perfect by any means. Your options are to change one of the cluster > names so that they hash to different values or (easier) add > > debugging output file\n"; print " -h usage\n"; print " -l Login name\n"; - print " -m DRAC/MC module name\n"; - print " -o Action: reboot (default), off or on\n"; + print " -m DRAC/MC or CMC module name\n"; + print " -o Action: reboot (default), off, on or status\n"; print " -p Login password\n"; print " -S Script to run to retrieve password\n"; print " -q quiet mode\n"; -- 1.5.5.1 From david at craigon.co.uk Tue Aug 5 11:04:42 2008 From: david at craigon.co.uk (David J Craigon) Date: Tue, 5 Aug 2008 12:04:42 +0100 Subject: [Linux-cluster] [iDRAC/ Dell M600 2/3] Add a comment to state the CMC version this script works on In-Reply-To: <1217934283-10326-1-git-send-email-david@craigon.co.uk> References: <1217934283-10326-1-git-send-email-david@craigon.co.uk> Message-ID: <1217934283-10326-2-git-send-email-david@craigon.co.uk> --- fence/agents/drac/fence_drac.pl | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fence/agents/drac/fence_drac.pl b/fence/agents/drac/fence_drac.pl index f96ef22..11cc771 100644 --- a/fence/agents/drac/fence_drac.pl +++ b/fence/agents/drac/fence_drac.pl @@ -13,6 +13,7 @@ # PowerEdge 1850 DRAC 4/I 1.35 (Build 09.27) # PowerEdge 1850 DRAC 4/I 1.40 (Build 08.24) # PowerEdge 1950 DRAC 5 1.0 (Build 06.05.12) +# PowerEdge M600 CMC 1.01.A05.200803072107 # use Getopt::Std; -- 1.5.5.1 From david at craigon.co.uk Tue Aug 5 11:04:41 2008 From: david at craigon.co.uk (David J Craigon) Date: Tue, 5 Aug 2008 12:04:41 +0100 Subject: [Linux-cluster] [iDRAC/ Dell M600 1/3] Fencing support for Dell M600 CMC (a DRAC in diguise) In-Reply-To: <1217451390.3371.3.camel@localhost.localdomain> References: <1217451390.3371.3.camel@localhost.localdomain> Message-ID: <1217934283-10326-1-git-send-email-david@craigon.co.uk> --- fence/agents/drac/fence_drac.pl | 36 +++++++++++++++++++++++++++++------- 1 files changed, 29 insertions(+), 7 deletions(-) diff --git a/fence/agents/drac/fence_drac.pl b/fence/agents/drac/fence_drac.pl index f199814..f96ef22 100644 --- a/fence/agents/drac/fence_drac.pl +++ b/fence/agents/drac/fence_drac.pl @@ -38,6 +38,7 @@ my $DRAC_VERSION_MC = 'DRAC/MC'; my $DRAC_VERSION_4I = 'DRAC 4/I'; my $DRAC_VERSION_4P = 'DRAC 4/P'; my $DRAC_VERSION_5 = 'DRAC 5'; +my $DRAC_VERSION_CMC = 'CMC'; my $PWR_CMD_SUCCESS = "/^OK/"; my $PWR_CMD_SUCCESS_DRAC5 = "/^Server power operation successful$/"; @@ -192,10 +193,15 @@ sub login # DRAC5 prints version controller version info # only after you've logged in. if ($drac_version eq $DRAC_VERSION_UNKNOWN) { - if ($t->waitfor(Match => "/.*\($DRAC_VERSION_5\)/m")) { + + if (my ($prematch,$match)=$t->waitfor(Match => "/.*(\($DRAC_VERSION_5\)|$DRAC_VERSION_CMC)/m")) { + if ($match=~/$DRAC_VERSION_CMC/) { + $drac_version = $DRAC_VERSION_CMC; + } else { $drac_version = $DRAC_VERSION_5; + } $cmd_prompt = "/\\\$ /"; - $PWR_CMD_SUCCESS = $PWR_CMD_SUCCESS_DRAC5; + $PWR_CMD_SUCCESS = $PWR_CMD_SUCCESS_DRAC5; } else { print "WARNING: unable to detect DRAC version '$_'\n"; } @@ -228,8 +234,10 @@ sub set_power_status } elsif ($drac_version eq $DRAC_VERSION_5) { $cmd = "racadm serveraction $svr_action"; - } else - { + } + elsif ($drac_version eq $DRAC_VERSION_CMC) { + $cmd = "racadm serveraction -m $modulename $svr_action"; + } else { $cmd = "serveraction -d 0 $svr_action"; } @@ -271,6 +279,11 @@ sub set_power_status } } fail "failed: unexpected response: '$err'" if defined $err; + + # on M600 blade systems, after power on or power off, status takes a couple of seconds to report correctly. Wait here before checking status again + sleep 5; + + } @@ -285,6 +298,8 @@ sub get_power_status if ($drac_version eq $DRAC_VERSION_5) { $cmd = "racadm serveraction powerstatus"; + } elsif ($drac_version eq $DRAC_VERSION_CMC) { + $cmd = "racadm serveraction powerstatus -m $modulename"; } else { $cmd = "getmodinfo"; } @@ -306,7 +321,7 @@ sub get_power_status fail "failed: unkown dialog exception: '$_'" unless (/^$cmd$/); - if ($drac_version ne $DRAC_VERSION_5) { + if ($drac_version ne $DRAC_VERSION_5 && $drac_version ne $DRAC_VERSION_CMC) { #Expect: # # # 1 ----> chassis Present ON Normal CQXYV61 @@ -335,6 +350,11 @@ sub get_power_status if(m/^Server power status: (\w+)/) { $status = lc($1); } + } + elsif ($drac_version eq $DRAC_VERSION_CMC) { + if(m/^(\w+)/) { + $status = lc($1); + } } else { my ($group,$arrow,$module,$presence,$pwrstate,$health, $svctag,$junk) = split /\s+/; @@ -364,7 +384,8 @@ sub get_power_status } $_=$status; - if(/^(on|off)$/i) + + if (/^(on|off)$/i) { # valid power states } @@ -440,6 +461,7 @@ sub do_action } set_power_status on; + fail "failed: $_" unless wait_power_status on; msg "success: powered on"; @@ -641,7 +663,7 @@ if ($drac_version eq $DRAC_VERSION_III_XT) fail "failed: option 'modulename' not compatilble with DRAC version '$drac_version'" if defined $modulename; } -elsif ($drac_version eq $DRAC_VERSION_MC) +elsif ($drac_version eq $DRAC_VERSION_MC || $drac_version eq $DRAC_VERSION_CMC) { fail "failed: option 'modulename' required for DRAC version '$drac_version'" unless defined $modulename; -- 1.5.5.1 From ccaulfie at redhat.com Tue Aug 5 11:44:16 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 05 Aug 2008 12:44:16 +0100 Subject: [Linux-cluster] How can I re-assign cluster id In-Reply-To: <47311dd20808050403h7c9ef563re09178bbc47a6eb5@mail.gmail.com> References: <47311dd20808050244w1de4d3c4i4e4cb14f6ba2bde5@mail.gmail.com> <48982543.6030607@redhat.com> <47311dd20808050403h7c9ef563re09178bbc47a6eb5@mail.gmail.com> Message-ID: <48983D10.5000308@redhat.com> Pedro Gonzalez Zamora wrote: > Dear Christine > > I have set and I trying again but I get this > error: > > cman: unable to set cluster_id > > Could you tell me please more about cluster name hash, how it works and > how can I change the values? > It sounds like you must have a rather old RHEL4 installation - the cluster_id changing code has been in there for a very long time now. The 'trick' to making the cluster names hash to unique values is simply to make the names very different really. Avoid long, similar, names that end in numbers for example. > > 2008/8/5 Christine Caulfield > > > Pedro Gonzalez Zamora wrote: > > Dear all > > > I have two clusters each cluster has two nodes, the first > cluster1 starts ok but de second cluster2 can't start because it > gets the same cluster ID that cluster1 and I don't know why?? > I have set diferent cluster name in cluster.conf. > > > It's probably that you've hit a weakness with the cluster name hash, > it's not perfect by any means. Your options are to change one of the > cluster names so that they hash to different values or (easier) add > > ... 10 more entries ... ... 10 more entries ... I also wrote a wrapper script named "power" around fence_ilo for testing, and for other maintenance scripts, i.e., power reboot net1 #!/bin/bash # # rev 07-May-2007 RHurst # CHOICES=( "off on reboot status" ) COMMAND=$1 while [ -z "${COMMAND}" ]; do echo -n "Command (${CHOICES[@]})? " read COMMAND [ -z "${COMMAND}" ] && exit done HOST=${2} HOSTIP="`dig +short ${HOST}ilo.cad.rack`" while [ -z "${HOSTIP}" ]; do echo -n "host? " read HOST [ -z "${HOST}" ] && exit HOSTIP="`dig +short ${HOST}ilo.cad.rack`" done PASSWD= [ "${HOST:0:3}" = "net" ] && PASSWD="cad${HOST}tendac" [ "${HOST:0:3}" = "app" ] && PASSWD="cad${HOST}ppadac" [ "${HOST:0:2}" = "db" ] && PASSWD="cad${HOST}bddac" [ -z "${PASSWD}" ] && exit [ $# -lt 2 ] && echo -n "Sending '${COMMAND}' to ${HOSTIP} iLO : " fence_ilo -a ${HOSTIP} -l Administrator -p ${PASSWD} -o ${COMMAND} ________________________________________________________________________ ??Robert Hurst, Sr. Cach? Administrator Beth Israel Deaconess Medical Center 1135 Tremont Street, REN-7 Boston, Massachusetts 02120-2140 617-754-8754 ? Fax: 617-754-8730 ? Cell: 401-787-3154 Any technology distinguishable from magic is insufficiently advanced. On Mon, 2008-08-04 at 10:11 +0200, Brett Cave wrote: > On Fri, Aug 1, 2008 at 12:06 PM, Balaji wrote: > > Dear All, > > > > Currently i am using HP x6600 Server and I have installed RHEL4 Update 4 AS > > Linux and > > RHEL4 Update 4 Support Cluster Suite in my server > > I am new in fence and can any one help me how to configure HP ILO fence in > > my server > > and HP ILO Fence Functionality > > I have just set it up, have not tested 100%, but what I have so far is: > 1) create fence usernames and passwords ILO on each of your devices. > 2) Update cluster.conf as follows: > > > > > > > > > > > hostname="192.168.0.101" login="fence" passwd="fencepassword"/> > > > > According to the docs, that SHOULD work, I am still having hanging > issues on access to certain files / directories on GFS, but still > pretty new to it, so not 100% sure whether its related to fencing or > not. > > > Regards > > -S.Balaji > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2178 bytes Desc: not available URL: From bobby.m.dalton at nasa.gov Tue Aug 5 21:56:04 2008 From: bobby.m.dalton at nasa.gov (Dalton, Maurice) Date: Tue, 5 Aug 2008 16:56:04 -0500 Subject: [Linux-cluster] 3 node cluster crashes Message-ID: I have a 3 node cluster running cman-2.0.84-2.el5. At times we have spanning tree events that cause network storms up to 9 seconds. When these events occur (today we caused them twice to verify this issue). All three nodes go down within seconds of this event. The second time we tried it I added the totem token statement shown below. Same problem. Aug 5 16:41:18 csarcsys2-eth0 ntpd[3484]: kernel time sync enabled 0001 Aug 5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] The token was lost in the OPERATIONAL state. Aug 5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Aug 5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 2. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 0. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Creating commit token because I am the rep. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Saving state aru 46 high seq received 46 Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Storing new sequence id for ring b50 Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering COMMIT state. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering RECOVERY state. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] position [0] member 172.xx.xx.xxx: Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] previous ring seq 2892 rep 172.xx.xxx.xx Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] aru 46 high delivered 46 received flag 1 Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Did not need to originate any messages in recovery. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Sending initial ORF token Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] CLM CONFIGURATION CHANGE Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] New Configuration: Aug 5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 1 Aug 5 16:41:24 csarcsys2-eth0 clurgmgrd[3750]: #1: Quorum Dissolved Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] r(0) ip(172. xx.xxx.xx) Aug 5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 3 Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] Members Left: Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] r(0) ip(172. xx.xxx.xx) Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] r(0) ip(172. xx.xxx.xx) Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] Members Joined: Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CMAN ] quorum lost, blocking activity Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] CLM CONFIGURATION CHANGE Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] New Configuration: Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] r(0) ip(172. xx.xxx.xx) Aug 5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate. Refusing connection. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] Members Left: Aug 5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect: Connection refused Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] Members Joined: Aug 5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111). Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [SYNC ] This node is within the primary component and will provide service. Aug 5 16:41:24 csarcsys2-eth0 ccsd[3031]: Someone may be attempting something evil. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering OPERATIONAL state. Aug 5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing get: Invalid request descriptor Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM ] got nodejoin message 172.24.86.143 Aug 5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate. Refusing connection. Aug 5 16:41:24 csarcsys2-eth0 openais[3096]: [CPG ] got joinlist message from node 2 Aug 5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect: Connection refused Aug 5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111). -------------- next part -------------- An HTML attachment was scrubbed... URL: From Norbert.Nemeth at mscibarra.com Wed Aug 6 09:30:27 2008 From: Norbert.Nemeth at mscibarra.com (Nemeth, Norbert) Date: Wed, 6 Aug 2008 11:30:27 +0200 Subject: [Linux-cluster] RE: 3 node cluster crashes In-Reply-To: References: Message-ID: Hi, I have a problem with rgmanager's script resource. My script uses $OCF_RESKEY_service_name in a following way: