From swhiteho at redhat.com Wed Sep 1 10:53:28 2010 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 01 Sep 2010 11:53:28 +0100 Subject: [Linux-cluster] GFS2 parameters In-Reply-To: <971769.26249.qm@web112802.mail.gq1.yahoo.com> References: <971769.26249.qm@web112802.mail.gq1.yahoo.com> Message-ID: <1283338408.2462.2.camel@localhost> Hi, On Mon, 2010-08-30 at 09:42 -0700, Srija wrote: > Hi, > > I am using gfs2 in a cluster environment , > > OS system using RHEL5.5 86_64, > kernel : 2.6.18-194.3.1.el5xen > > Trying to tune the GFS file system, but few parameters not getting , > > like demote_secs. > > The error is as follows: > > gfs2_tool: can't open /sys/fs/gfs2/gfsred:GFS-XEN-IMAGES/tune/demote_secs: No such file or directory > > Can anybody please help me , what I am missing? > > thanks in advance > > Why are you trying to change demote_secs? Do you have a performance problem of some kind? This parameter is obsolete and has been removed in gfs2, Steve. > > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From bturner at redhat.com Wed Sep 1 14:48:23 2010 From: bturner at redhat.com (Ben Turner) Date: Wed, 1 Sep 2010 10:48:23 -0400 (EDT) Subject: [Linux-cluster] Fencing through iLO and functioning of kdump In-Reply-To: <545151688.665561283352478743.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Message-ID: <679070528.665741283352503659.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Here is a kbase on fence scsi: https://access.redhat.com/kb/docs/DOC-17809 It should answer any questions you have: https://access.redhat.com/kb/docs/DOC-17809 Usually I try the fence_scsi_test to be sure my devices are capable, note: "To assist with finding and detecting devices which are (or are not) suitable for use with fence_scsi, a tool has been provided. The fence_scsi_test script will find devices visible to the node and report whether or not they are compatible with SCSI persistent reservations." -Ben ----- "Chris Jankowski" wrote: > Ben, > > Thank you for pointing me at fence_scsi. > It looks like fence_scsi will fit the bill elegantly. And it should be > much more reliable then iLO fencing if the cluster uses properly > configured, dual fabric FC SAN for shared storage. > > I read the fence_scsi manual page and have one more question. > > What do I need to do for my cluster to start using SCSI reservations? > Is this done by default? > > Thanks and regards, > > Chris Jankowski > > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Ben Turner > Sent: Saturday, 28 August 2010 03:29 > To: linux clustering > Subject: Re: [Linux-cluster] Fencing through iLO and functioning of > kdump > > You have a couple options here: > > 1. Switch to fence_scsi(uses scsi reservation as you described) or an > other I/O fencing method that does not reboot the system. This will > enable you core dump to complete without power fencing interrupting > it. > > 2. Put in a post fail delay long enough for fencing to complete. > This is sub optimal as your cluster services/resources will be hung > for the duration of the post fail delay. I usually only do this when > I know I have a node that is crashing and no I/O fencing > capabilities. > > 3. If you don't have access to an I/O fence agent and it post fail > delay won't work for some reason you can try: > > Best practice I can think of right now would be the following: > 1. disable the power fence device on the host you're seeing panics on, > I have changed the IP for it in cluster.conf in the past 2. when that > node fails, the other nodes will attempt to fence the host > and it will fail since the fence device was disabled > (NOTE: between steps 2 and 3, cluster operation is suspended) 3. > administrator can now do things like: > - disconnect the FC and network cables form the affected host > ensuring > that it is 'manually I/O fenced' > - run fence_ack_manual on the other host to override the failed > fencing operation to continue cluster operation on the other > nodes 4. Now the failed host is free to continue kdumping for as long > as need be > > Hope this helps. > > -b > > > ----- "Chris Jankowski" wrote: > > > Hi, > > > > How can I reconcile the need to have Kdump configured and > operational > > on cluster nodes with the need for fencing of a node most commonly > and > > conveniently implemented through iLO on HP servers? > > > > Customers require Kdump configured and operational to be able to > have > > kernel crashes analysed by Red Hat support. The taking of crash dump > > > starts immediately after the crash, but it may take very > considerable > > time on a machine with 512 GB of memory (more than an hour) if done > in > > dumplevel 0 and over 1 GBE network. However, if I use iLO fencing > then > > the crashed node will be powered off through iLO which will > > irrecovably kill the the kernel dump in progress and erase the > memory > > content containing the crashed kernel image. > > > > Ideally, I would love to have the functionality that is present in > > several UNIX clusters, when a crashed node completes its kernel > crash > > dump in peace. In UNIX clusters the crashed node can be configured > to > > reboot automatically after kernel crash and rejoin the cluster. It > > typically does the kernel dump as a part of the boot. > > > > The UNIX clusters typically use SCSI reservation to protect > integrity > > of storage. This enables them to keep the failed node isolated > whilst > > it is still able to do the kernel crash dump before rejoining the > > cluster. I believe this option is not avilable in Linux Cluster. > > > > So, how can I have functioning Linux cluster with ability of taking > a > > kernel crash dump of crashed nodes and without blocking the access > to > > shared GFS2 filesystem for the hour or so that bit may take a crash > > > dump obn a very large system? > > > > Thanks and regards, > > > > Chris Jankowski > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rohara at redhat.com Wed Sep 1 17:11:45 2010 From: rohara at redhat.com (Ryan O'Hara) Date: Wed, 1 Sep 2010 12:11:45 -0500 Subject: [Linux-cluster] Fencing through iLO and functioning of kdump In-Reply-To: <679070528.665741283352503659.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> References: <545151688.665561283352478743.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> <679070528.665741283352503659.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Message-ID: <20100901171145.GD1721@redhat.com> On Wed, Sep 01, 2010 at 10:48:23AM -0400, Ben Turner wrote: > Here is a kbase on fence scsi: > > https://access.redhat.com/kb/docs/DOC-17809 > > It should answer any questions you have: > > https://access.redhat.com/kb/docs/DOC-17809 > > Usually I try the fence_scsi_test to be sure my devices are capable, note: > > "To assist with finding and detecting devices which are (or are not) suitable for use with fence_scsi, a tool has been provided. The fence_scsi_test script will find devices visible to the node and report whether or not they are compatible with SCSI persistent reservations." I just have to comment that fence_scsi_test is rather limited. I'm currently working on making it more robust, such that it more accurately tests device(s) for SCSI-PR support. Basically there are two issues: 1. The current script does not verify that registrations exist on a device -- it relies on the error code returned from sg_persist. This usually works, but we have seen some arrays that will report false positives. 2. The script *only* puts a registration on the device(s) and then removes the registration from each device. This doesn't tell the whole story, since it the array must also support the preempt-and-abort operation. A new fence_scsi_test script should be available in the very near future. Here is the relevant BZ: https://bugzilla.redhat.com/show_bug.cgi?id=603838 Ryan > ----- "Chris Jankowski" wrote: > > > Ben, > > > > Thank you for pointing me at fence_scsi. > > It looks like fence_scsi will fit the bill elegantly. And it should be > > much more reliable then iLO fencing if the cluster uses properly > > configured, dual fabric FC SAN for shared storage. > > > > I read the fence_scsi manual page and have one more question. > > > > What do I need to do for my cluster to start using SCSI reservations? > > Is this done by default? > > > > Thanks and regards, > > > > Chris Jankowski > > > > -----Original Message----- > > From: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Ben Turner > > Sent: Saturday, 28 August 2010 03:29 > > To: linux clustering > > Subject: Re: [Linux-cluster] Fencing through iLO and functioning of > > kdump > > > > You have a couple options here: > > > > 1. Switch to fence_scsi(uses scsi reservation as you described) or an > > other I/O fencing method that does not reboot the system. This will > > enable you core dump to complete without power fencing interrupting > > it. > > > > 2. Put in a post fail delay long enough for fencing to complete. > > This is sub optimal as your cluster services/resources will be hung > > for the duration of the post fail delay. I usually only do this when > > I know I have a node that is crashing and no I/O fencing > > capabilities. > > > > 3. If you don't have access to an I/O fence agent and it post fail > > delay won't work for some reason you can try: > > > > Best practice I can think of right now would be the following: > > 1. disable the power fence device on the host you're seeing panics on, > > I have changed the IP for it in cluster.conf in the past 2. when that > > node fails, the other nodes will attempt to fence the host > > and it will fail since the fence device was disabled > > (NOTE: between steps 2 and 3, cluster operation is suspended) 3. > > administrator can now do things like: > > - disconnect the FC and network cables form the affected host > > ensuring > > that it is 'manually I/O fenced' > > - run fence_ack_manual on the other host to override the failed > > fencing operation to continue cluster operation on the other > > nodes 4. Now the failed host is free to continue kdumping for as long > > as need be > > > > Hope this helps. > > > > -b > > > > > > ----- "Chris Jankowski" wrote: > > > > > Hi, > > > > > > How can I reconcile the need to have Kdump configured and > > operational > > > on cluster nodes with the need for fencing of a node most commonly > > and > > > conveniently implemented through iLO on HP servers? > > > > > > Customers require Kdump configured and operational to be able to > > have > > > kernel crashes analysed by Red Hat support. The taking of crash dump > > > > > starts immediately after the crash, but it may take very > > considerable > > > time on a machine with 512 GB of memory (more than an hour) if done > > in > > > dumplevel 0 and over 1 GBE network. However, if I use iLO fencing > > then > > > the crashed node will be powered off through iLO which will > > > irrecovably kill the the kernel dump in progress and erase the > > memory > > > content containing the crashed kernel image. > > > > > > Ideally, I would love to have the functionality that is present in > > > several UNIX clusters, when a crashed node completes its kernel > > crash > > > dump in peace. In UNIX clusters the crashed node can be configured > > to > > > reboot automatically after kernel crash and rejoin the cluster. It > > > typically does the kernel dump as a part of the boot. > > > > > > The UNIX clusters typically use SCSI reservation to protect > > integrity > > > of storage. This enables them to keep the failed node isolated > > whilst > > > it is still able to do the kernel crash dump before rejoining the > > > cluster. I believe this option is not avilable in Linux Cluster. > > > > > > So, how can I have functioning Linux cluster with ability of taking > > a > > > kernel crash dump of crashed nodes and without blocking the access > > to > > > shared GFS2 filesystem for the hour or so that bit may take a crash > > > > > dump obn a very large system? > > > > > > Thanks and regards, > > > > > > Chris Jankowski > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From cos at aaaaa.org Wed Sep 1 18:03:22 2010 From: cos at aaaaa.org (Ofer Inbar) Date: Wed, 1 Sep 2010 14:03:22 -0400 Subject: [Linux-cluster] logging from the resource agent script In-Reply-To: <20100824003816.GM18763@mip.aaaaa.org> References: <20100824003816.GM18763@mip.aaaaa.org> Message-ID: <20100901180322.GO18256@mip.aaaaa.org> I got the answers to the questions about ocf_log that I posted last week, so I'm following up to the list in case anyone finds these in the list archives and wonders what the solution was. I wasn't able to find good answers via Google before, so hopefully this email will fix that :) On Mon, Aug 23, 2010 at 08:38:16PM -0400, I wrote: > Right now, the question that's vexing me is how to log custom messages > from this resource agent script, to give the operator more information > about what the cluster is doing (such as, for example, the exact > commands that are run when starting and stopping the service, or what > the real return code from the health check is, rather than just "did > it fail?"). > 1. ocf_log statements I put at the top level of the script do log, > but any that I put inside functions such as start() and stop() don't. > Why don't my custom log messages appear in /var/log/messages when > other messages at the same level (such as info or notice) from > rgmanager do, and when the start() or stop() function is clearly being > called? > > 2. ocf_log seems to sometimes, or always, output to stdout, which > means I have to take care *not* to let it run when meta-data is the > argument, because it'd pollute the metadata XML. But then how do I > log anything from the times the script is run for metadata, if I want? > > Should this work? Is there another, better way of making resource > agent scripts log custom messages? > > And what happens to the resource agent script's stdout, anyway? So, first of all, the resource agent script's stdout and stderr are tied to /dev/null *except* when it's being called for meta-data. It is not logged anywhere. Secondly, the problem with ocf_log not logging was very simple, but obfuscated by the fact that stderr was thrown to the bit bucket. ocf_log is a shell function which always outputs to stdout and also calls a separate program called clulog to send stuff to syslog. It assumes clulog is in the path, which means the resource agent needs /usr/sbin in its path, which was missing from my script. A simple oversight, would've been obvious if I'd see then "clulog: command not found" errors. One potential hitch is that ocf_log just passes its string argument to clulog on the command line enclosed in double quotes, so you could have shell quoting issues. Quoting once (in your call to ocf_log in the resource agent string) is not necessarily enough, there's going to be a second level of shell interpolation, though it's double-quoted. One failure would be if you start your string with a - character, because then clulog will think it's another command line switch. Note: My confusion about ocf_log "sometimes" sending to stdout was caused by the fact that the resource agent's stdout was going to /dev/null except when it was being called for meta-data. ocf_log always writes to stdout, and rgmanager was sometimes looking at it and sometimes bitbucketing it. Finally, a very useful debugging tool I was not aware of when I first asked the question, that makes it much easier to see what's going on: rg_test test /etc/cluster/cluster.conf [status|start|stop] service [service] (run as root, or with sudo) This runs your resource agent as rgmanager would, but shows you stdout and stderr. -- Cos From christopher.walker at gmail.com Wed Sep 1 20:40:54 2010 From: christopher.walker at gmail.com (Chris Walker) Date: Wed, 1 Sep 2010 16:40:54 -0400 Subject: [Linux-cluster] active/active NFS cluster question Message-ID: Hello, I suspect that I'm doing something fairly stupid, but I'm having a problem with a cluster that is exporting the same GFS filesystems to the same nfs clients. Everything thing is fine until I relocate one of the nfs services to another machine. The relocation goes fine, but once I have two nfs services on the same machine, I can't get them apart. When I relocate one of the two nfs services to a different cluster host, the relocation wipes out the entries in /var/lib/nfs/etab, forcing the second nfs service on that node to relocate as well (I get the error "nfsclient:rc_nfs_clients is missing!"). Any suggestions? Other than modifying /usr/share/cluster/nfsclient.sh, is there some way to prevent the etab entries from being purged? Thanks! Chris cluster.conf: From fdinitto at redhat.com Thu Sep 2 12:45:18 2010 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Thu, 02 Sep 2010 14:45:18 +0200 Subject: [Linux-cluster] Cluster 3.0.16 stable release Message-ID: <4C7F9C5E.6000606@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 The cluster team and its community are proud to announce the 3.0.16 stable release from the STABLE3 branch. This release contains a few major bug fixes. We strongly recommend people to update their clusters. We also welcome Digimer to the development team and her contribution of fence_nodeassassin in time for this release. (if you are wondering where 3.0.15 is, the tarballs are available at the usual URL, but due to a change in some headers, it probably will not build on your system, 3.0.16 addresses that problem specifically) In order to build/run the 3.0.16 release you will need: - - corosync 1.2.8 - - openais 1.1.4 - - linux kernel 2.6.31 (only for GFS1 users) The new source tarball can be downloaded here: https://fedorahosted.org/releases/c/l/cluster/cluster-3.0.16.tar.bz2 To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Thanks/congratulations to all people that contributed to achieve this great milestone. Happy clustering, Fabio Under the hood (from 3.0.15): Fabio M. Di Nitto (1): cman: fix build with old headers (f12 and older) cman/cman_tool/main.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) Under the hood (from 3.0.14): Bob Peterson (3): gfs2-utils: mkfs can't fsync device with 32MB RGs fsck.gfs2 deletes directories if they get too big fsck.gfs2 segfaults if journals are missing David Teigland (5): dlm_controld: fix save_plocks initialization dlm_controld: fix plock owner syncing dlm_controld: fix plock signature in stored message Revert "dlm_controld: fix save_plocks initialization" dlm_controld: ignore plocks until checkpoint time Fabio M. Di Nitto (11): cman: do not propagate old configurations around fence_na: import files pristine from upstream build: fix man page install from outside source tree fence_na: first cut at the Makefile fence_na: add example config file build: rename CONFFILEEXAMPLE to EXTRACONFFILE fence_na: generate files based on configure invokation fence_na: fix last installation bits required to work in our build env fence_na: add copyright/author information fence_na: add support to the validation schema config: Update ldif schema Lon Hohberger (13): cman: Make qdiskd exit if removed from configuration cman: Clarify man page on config distribution rgmanager: Fix clustat return code rgmanager: Honor restricted FDs during migrations config: Add missing fence-agent options to RNG schema config: Add missing fence-agent options to LDAP schema rgmanager: Present flags in clustat output config: Fix broken fence_egenera options config: Add fence_egenera options to ldif doc: Update autogenerated documentation rgmanager: fix compiler warning in clulog.c config: Present fencing agent name in metadata config: Add fencing agent name to group for clarity Marek 'marx' Grac (5): fence_drac5: make "port" a synonym of "module_name" for drac5 fencing: Method to cause one node to delay fencing fencing: Method to cause one node to delay fencing [2] fencing: Method to cause one node to delay fencing - drac, egenera fencing: Method to cause one node to delay fencing - ipmilan Ryan O'Hara (1): Fix syntax error in code that opens logfile. cman/cman_tool/main.c | 37 +- cman/man/cman_tool.8 | 24 +- cman/qdisk/main.c | 33 +- config/plugins/ldap/99cluster.ldif | 98 +++- config/plugins/ldap/ldap-base.csv | 11 +- config/tools/xml/ccs_config_validate.in | 34 +- config/tools/xml/cluster.rng.in | 565 +++++++++++++---- doc/COPYRIGHT | 3 + doc/cluster_conf.html | 26 +- fence/agents/drac/fence_drac.8 | 6 + fence/agents/drac/fence_drac.pl | 13 +- fence/agents/egenera/fence_egenera.8 | 6 + fence/agents/egenera/fence_egenera.pl | 10 +- fence/agents/ipmilan/ipmilan.c | 29 +- fence/agents/lib/fence2rng.xsl | 3 +- fence/agents/lib/fencing.py.py | 23 +- fence/agents/node_assassin/Makefile | 50 ++ fence/agents/node_assassin/fence_na.conf.in | 84 +++ fence/agents/node_assassin/fence_na.lib.in | 919 +++++++++++++++++++++++++++ fence/agents/node_assassin/fence_na.pl | 162 +++++ fence/agents/node_assassin/fence_na.pod.in | 188 ++++++ fence/agents/scsi/fence_scsi.pl | 2 +- gfs2/convert/gfs2_convert.c | 12 +- gfs2/fsck/fs_recovery.c | 55 ++- gfs2/fsck/fs_recovery.h | 7 +- gfs2/fsck/metawalk.c | 45 +- gfs2/fsck/metawalk.h | 5 +- gfs2/fsck/pass1.c | 144 +++-- gfs2/fsck/pass1b.c | 15 +- gfs2/libgfs2/libgfs2.h | 5 +- gfs2/libgfs2/rgrp.c | 14 +- gfs2/libgfs2/structures.c | 32 +- gfs2/libgfs2/super.c | 43 +-- group/dlm_controld/cpg.c | 43 ++- group/dlm_controld/dlm_daemon.h | 1 + group/dlm_controld/plock.c | 120 +++-- make/install.mk | 6 +- make/uninstall.mk | 3 + rgmanager/src/daemons/rg_state.c | 5 + rgmanager/src/utils/clulog.c | 2 +- rgmanager/src/utils/clustat.c | 53 ++- 41 files changed, 2563 insertions(+), 373 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBCAAGBQJMf5xcAAoJEFA6oBJjVJ+OZLEP/Rsvn6n1T29WCtlQCqdK/Ux0 Ljc2Py/JcPaptuR2oeqDAAjNSb5WnI8kBNMp5XJ0bbegn72m1OVCNTyCTnuvHGHe CPwFLx7WfOpusBayhHzpErPBTjBMROt4noZI9+iWSkbjr1YERPowbBZ3NRpjKaye QIX/Z4Wc7loqevzeg3h8HYmhf2Ka7t3VsMKzmGMRdUKeuFqUI6XWvqd8Q8YxR4gd 2mgu4OODHgvv7dD/vt1OSRI62/uUT92R5edRuK7Y0FizQ0ujWWOv10KsfAULzLKI fLqhBaX29OZE68AeAkfSZ98p5E7vreVTXT0QAds6kIVjw53ZRJ9LH57pEzB6vMmh xzb4vjD8ChU3WNCYE1GYDxF28cBHTzintNv1MNiSFAP1vC1r7UaAZ6GJGztE506a ZGT/wOOfgFmkk0u1oT6cPwnkMXbIHDJVPqd1Ds+M0Pz3UMNZ+ta9k8YnkOkgPrZJ Lne81a51u7wLqKc+2BD34TBwxSpETL4oHiYR5wnWVjugsipBnPV9f5bZMwOtsOSm bi6/r8NcVY9wsuo28wIBFISkyyyppw7v0ohUS/nne3Colr3dJJAJB73nYTtenPQd Ir9219EyRCevLrmI9K+7a9GSBPulIFXkGWXJLFBu/lyWfUSOHU5uGC/ZG8kJwEus rug84hGXCJ06GNZmtoAc =LV1N -----END PGP SIGNATURE----- From girishpati at yahoo.com Wed Sep 8 10:05:30 2010 From: girishpati at yahoo.com (Girish Prajapati) Date: Wed, 8 Sep 2010 03:05:30 -0700 (PDT) Subject: [Linux-cluster] need help - Fencing problem Message-ID: <178789.16151.qm@web120516.mail.ne1.yahoo.com> Hello Everybody, i am having problem of fencing a cluster node? let me explain indetail : I have installed RHEL 5.4 on? HP Prolaint DL280 G5 servers and iLO 2as fencing device. Am managing cluster through Luci - (Conga). itseems everything is working fine. I can reboot cluster nodes through Luci and service get transfer to another node. After rebooting node connect to cluster automatically without any error. Problem is i can not do Fence this node through Luci, when i try to fence any node i get following error : Sep? 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device Sep? 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was unsuccessful my iLO license is : iLO 2 Advanced Evaluation Do i need to have? license of iLO or there is problem in configuration of cluster ? how i can check cluster log in details. Appreciate your help. Thank you in advance. Regards, Girishkumar R Prajapati -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.ishak at gmail.com Wed Sep 8 11:09:58 2010 From: jacob.ishak at gmail.com (jacob ishak) Date: Wed, 8 Sep 2010 14:09:58 +0300 Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <178789.16151.qm@web120516.mail.ne1.yahoo.com> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> Message-ID: it might be ilo login issue check fencing device authentication type i faced this isse on Sun ILOM fencing device , i changed authentication type to md5 and it worked from cluster.conf: fencedevice agent="fence_ipmilan" auth="md5" On Wed, Sep 8, 2010 at 1:05 PM, Girish Prajapati wrote: > Hello Everybody, > i am having problem of fencing a cluster node let me explain indetail : > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as > fencing device. Am managing cluster through Luci - (Conga). itseems > everything is working fine. I can reboot cluster nodes through Luci and > service get transfer to another node. After rebooting node connect to > cluster automatically without any error. > Problem is i can not do Fence this node through Luci, when i try to fence > any node i get following error : > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was > unsuccessful > > my iLO license is : iLO 2 Advanced Evaluation > Do i need to have license of iLO or there is problem in configuration of > cluster ? > how i can check cluster log in details. > > Appreciate your help. > Thank you in advance. > > Regards, > Girishkumar R Prajapati > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From girishpati at yahoo.com Wed Sep 8 12:24:15 2010 From: girishpati at yahoo.com (Girish Prajapati) Date: Wed, 8 Sep 2010 05:24:15 -0700 (PDT) Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> Message-ID: <550346.91123.qm@web120518.mail.ne1.yahoo.com> Hello Jecob, Thanks for your reply.. i try to change as you explain but still there is same problem. When i try to fence any node from Luci, i get?following error on the web browser : --? Unable to retrieve batch 1223037152 status from node2.drctmb.com:11111: fence_node failed: Node "node1.drctmb.com" is being fenced by node "node2.drctmb.com" -- You will be redirected in 5 seconds. ??? Stop waiting for this job to complete --Unable to retrieve batch 719909649 status from node1.drctmb.com:11111: fence_node failed: Node "node2.drctmb.com" is being fenced by node "node1.drctmb.com" -- You will be redirected in 5 seconds. ??? Stop waiting for this job to complete any idea why am gettting this error message? ?? Regards, Girishkumar R Prajapati ________________________________ From: jacob ishak To: linux clustering Sent: Wed, September 8, 2010 1:09:58 PM Subject: Re: [Linux-cluster] need help - Fencing problem it might be ilo login issue check fencing device authentication type i faced this isse on Sun ILOM fencing device , i changed authentication type to md5 and it worked from cluster.conf: fencedevice agent="fence_ipmilan" auth="md5" On Wed, Sep 8, 2010 at 1:05 PM, Girish Prajapati wrote: Hello Everybody, >i am having problem of fencing a cluster node? let me explain indetail : >I have installed RHEL 5.4 on? HP Prolaint DL280 G5 servers and iLO 2as fencing >device. Am managing cluster through Luci - (Conga). itseems everything is >working fine. I can reboot cluster nodes through Luci and service get transfer >to another node. After rebooting node connect to cluster automatically without >any error. >Problem is i can not do Fence this node through Luci, when i try to fence any >node i get following error : > >Sep? 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to >connect/login to fencing device >Sep? 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was >unsuccessful > >my iLO license is : iLO 2 Advanced Evaluation >Do i need to have? license of iLO or there is problem in configuration of >cluster ? >how i can check cluster log in details. > >Appreciate your help. >Thank you in advance. > >Regards, >Girishkumar R Prajapati > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From esggrupos at gmail.com Wed Sep 8 12:57:25 2010 From: esggrupos at gmail.com (ESGLinux) Date: Wed, 8 Sep 2010 14:57:25 +0200 Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <178789.16151.qm@web120516.mail.ne1.yahoo.com> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> Message-ID: Hello, Have you configured the iLO devices entering in the BIOS? I remenber I have to set up the user/pass in the iLO and marked the iLo as not shared HTH, ESG 2010/9/8 Girish Prajapati > Hello Everybody, > i am having problem of fencing a cluster node let me explain indetail : > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as > fencing device. Am managing cluster through Luci - (Conga). itseems > everything is working fine. I can reboot cluster nodes through Luci and > service get transfer to another node. After rebooting node connect to > cluster automatically without any error. > Problem is i can not do Fence this node through Luci, when i try to fence > any node i get following error : > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was > unsuccessful > > my iLO license is : iLO 2 Advanced Evaluation > Do i need to have license of iLO or there is problem in configuration of > cluster ? > how i can check cluster log in details. > > Appreciate your help. > Thank you in advance. > > Regards, > Girishkumar R Prajapati > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Jankowski at hp.com Wed Sep 8 22:30:57 2010 From: Chris.Jankowski at hp.com (Jankowski, Chris) Date: Wed, 8 Sep 2010 22:30:57 +0000 Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> Message-ID: <036B68E61A28CA49AC2767596576CD596BACEB2F3D@GVW1113EXC.americas.hpqcorp.net> Why did you have to set iLO as non-shared? Thank and regards, Chris ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of ESGLinux Sent: Wednesday, 8 September 2010 22:57 To: linux clustering Subject: Re: [Linux-cluster] need help - Fencing problem Hello, Have you configured the iLO devices entering in the BIOS? I remenber I have to set up the user/pass in the iLO and marked the iLo as not shared HTH, ESG 2010/9/8 Girish Prajapati > Hello Everybody, i am having problem of fencing a cluster node let me explain indetail : I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as fencing device. Am managing cluster through Luci - (Conga). itseems everything is working fine. I can reboot cluster nodes through Luci and service get transfer to another node. After rebooting node connect to cluster automatically without any error. Problem is i can not do Fence this node through Luci, when i try to fence any node i get following error : Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was unsuccessful my iLO license is : iLO 2 Advanced Evaluation Do i need to have license of iLO or there is problem in configuration of cluster ? how i can check cluster log in details. Appreciate your help. Thank you in advance. Regards, Girishkumar R Prajapati -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From girishpati at yahoo.com Thu Sep 9 05:29:51 2010 From: girishpati at yahoo.com (Girish Prajapati) Date: Wed, 8 Sep 2010 22:29:51 -0700 (PDT) Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> Message-ID: <541653.54252.qm@web120512.mail.ne1.yahoo.com> Hello... I have already configure BIOS for iLO.. but am not sure why i don need to shared ?? please anybody can help me out for this problem. Do i need any extra setup for fencing device ? thanks ________________________________ From: ESGLinux To: linux clustering Sent: Wed, September 8, 2010 2:57:25 PM Subject: Re: [Linux-cluster] need help - Fencing problem Hello,? Have you configured the iLO devices entering in the BIOS? I remenber I have to set up the user/pass in the iLO and marked the iLo as not shared HTH,? ESG 2010/9/8 Girish Prajapati Hello Everybody, >i am having problem of fencing a cluster node? let me explain indetail : >I have installed RHEL 5.4 on? HP Prolaint DL280 G5 servers and iLO 2as fencing >device. Am managing cluster through Luci - (Conga). itseems everything is >working fine. I can reboot cluster nodes through Luci and service get transfer >to another node. After rebooting node connect to cluster automatically without >any error. >Problem is i can not do Fence this node through Luci, when i try to fence any >node i get following error : > >Sep? 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to >connect/login to fencing device >Sep? 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was >unsuccessful > >my iLO license is : iLO 2 Advanced Evaluation >Do i need to have? license of iLO or there is problem in configuration of >cluster ? >how i can check cluster log in details. > >Appreciate your help. >Thank you in advance. > >Regards, >Girishkumar R Prajapati > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brem.belguebli at gmail.com Thu Sep 9 06:00:28 2010 From: brem.belguebli at gmail.com (Brem Belguebli) Date: Thu, 09 Sep 2010 08:00:28 +0200 Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <541653.54252.qm@web120512.mail.ne1.yahoo.com> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> <541653.54252.qm@web120512.mail.ne1.yahoo.com> Message-ID: <1284012028.3342.3.camel@newgen.localdomain> try run this from another node of the cluster fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot Additionnally, by connecting thru http to the Ilo, you should be able to see Ilo logs (in the general tab) and see if it is due to a lack of licensing On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote: > Hello... > > I have already configure BIOS for iLO.. but am not sure why i don need > to shared ?? > please anybody can help me out for this problem. > Do i need any extra setup for fencing device ? > thanks > > > > ______________________________________________________________________ > From: ESGLinux > To: linux clustering > Sent: Wed, September 8, 2010 2:57:25 PM > Subject: Re: [Linux-cluster] need help - Fencing problem > > Hello, > > > Have you configured the iLO devices entering in the BIOS? > > > I remenber I have to set up the user/pass in the iLO and marked the > iLo as not shared > > > > > HTH, > > > ESG > > 2010/9/8 Girish Prajapati > Hello Everybody, > i am having problem of fencing a cluster node let me explain > indetail : > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and > iLO 2as fencing device. Am managing cluster through Luci - > (Conga). itseems everything is working fine. I can reboot > cluster nodes through Luci and service get transfer to another > node. After rebooting node connect to cluster automatically > without any error. > Problem is i can not do Fence this node through Luci, when i > try to fence any node i get following error : > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" > reports: Unable to connect/login to fencing device > Sep 8 14:51:16 node2 fence_node[9106]: Fence of > "node1.drctmb.com" was unsuccessful > > my iLO license is : iLO 2 Advanced Evaluation > Do i need to have license of iLO or there is problem in > configuration of cluster ? > how i can check cluster log in details. > > Appreciate your help. > Thank you in advance. > > Regards, > Girishkumar R Prajapati > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From girishpati at yahoo.com Thu Sep 9 07:43:45 2010 From: girishpati at yahoo.com (Girish Prajapati) Date: Thu, 9 Sep 2010 00:43:45 -0700 (PDT) Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <1284012028.3342.3.camel@newgen.localdomain> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> <541653.54252.qm@web120512.mail.ne1.yahoo.com> <1284012028.3342.3.camel@newgen.localdomain> Message-ID: <744023.36408.qm@web120512.mail.ne1.yahoo.com> Hello, i can run following command successfully from another node but still getting same error message : fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot Sep 9 14:37:00 node2 openais[2904]: [CLM ] Members Joined: Sep 9 14:37:00 node2 openais[2904]: [SYNC ] This node is within the primary component and will provide service. Sep 9 14:37:00 node2 openais[2904]: [TOTEM] entering OPERATIONAL state. Sep 9 14:37:00 node2 openais[2904]: [CLM ] got nodejoin message 192.168.0.28 Sep 9 14:37:00 node2 openais[2904]: [CPG ] got joinlist message from node 1 Sep 9 14:37:00 node2 fenced[2923]: node1.drctmb.com not a cluster member after 0 sec post_fail_delay Sep 9 14:37:00 node2 fenced[2923]: fencing node "node1.drctmb.com" Sep 9 14:37:10 node2 fenced[2923]: agent "fence_ilo" reports: Unable to connect/login to fencing device Sep 9 14:37:10 node2 fenced[2923]: fence "node1.drctmb.com" failed Sep 9 14:37:15 node2 fenced[2923]: fencing node "node1.drctmb.com" Sep 9 14:37:26 node2 fenced[2923]: agent "fence_ilo" reports: Unable to connect/login to fencing device node1 rebooted and get connect to the cluster but now my webby service not working see below log : Broadcast message from root (Thu Sep 9 14:32:41 2010): The system is going down for system halt NOW! Sep 9 14:19:22 node1 last message repeated 17 times Sep 9 14:32:41 node1 shutdown[25506]: shutting down for system halt Sep 9 14:32:41 node1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found Sep 9 14:32:43 node1 modclusterd: shutdown succeeded Sep 9 14:32:43 node1 rgmanager: [25593]: Shutting down Cluster Service Manager... Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down Sep 9 14:32:43 node1 clurgmgrd[3457]: Stopping service service:webby Sep 9 14:32:44 node1 avahi-daemon[3378]: Withdrawing address record for 192.168.0.30 on eth0. Read from remote host node1: Connection reset by peer . . . Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/hda, packet devices [this device CD/DVD] not SMART capable Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, opened Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, IE (SMART) not enabled, skip device Try 'smartctl -s on /dev/sda' to turn on SMART features Sep 9 14:35:42 node1 smartd[3585]: Monitoring 0 ATA and 0 SCSI devices Sep 9 14:35:42 node1 smartd[3604]: smartd has fork()ed into background mode. New PID=3604. Sep 9 14:35:42 node1 avahi-daemon[3412]: Service "SFTP File Transfer on node1" (/services/sftp-ssh.service) successfully established. Sep 9 14:35:45 node1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found Sep 9 14:35:45 node1 last message repeated 3 times Sep 9 14:35:45 node1 kernel: mtrr: type mismatch for d8000000,2000000 old: uncachable new: write-combining Sep 9 14:35:46 node1 clurgmgrd: [3491]: Checking Existence Of File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed - File Doesn't Exist It seems that there problem in fencing device configuration. Please find here my cluster.conf : ~ This is first time am working on Clustering so please help me. Appreciate your help. Thank you. ________________________________ From: Brem Belguebli To: linux clustering Sent: Thu, September 9, 2010 11:30:28 AM Subject: Re: [Linux-cluster] need help - Fencing problem try run this from another node of the cluster fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot Additionnally, by connecting thru http to the Ilo, you should be able to see Ilo logs (in the general tab) and see if it is due to a lack of licensing On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote: > Hello... > > I have already configure BIOS for iLO.. but am not sure why i don need > to shared ?? > please anybody can help me out for this problem. > Do i need any extra setup for fencing device ? > thanks > > > > ______________________________________________________________________ > From: ESGLinux > To: linux clustering > Sent: Wed, September 8, 2010 2:57:25 PM > Subject: Re: [Linux-cluster] need help - Fencing problem > > Hello, > > > Have you configured the iLO devices entering in the BIOS? > > > I remenber I have to set up the user/pass in the iLO and marked the > iLo as not shared > > > > > HTH, > > > ESG > > 2010/9/8 Girish Prajapati > Hello Everybody, > i am having problem of fencing a cluster node let me explain > indetail : > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and > iLO 2as fencing device. Am managing cluster through Luci - > (Conga). itseems everything is working fine. I can reboot > cluster nodes through Luci and service get transfer to another > node. After rebooting node connect to cluster automatically > without any error. > Problem is i can not do Fence this node through Luci, when i > try to fence any node i get following error : > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" > reports: Unable to connect/login to fencing device > Sep 8 14:51:16 node2 fence_node[9106]: Fence of > "node1.drctmb.com" was unsuccessful > > my iLO license is : iLO 2 Advanced Evaluation > Do i need to have license of iLO or there is problem in > configuration of cluster ? > how i can check cluster log in details. > > Appreciate your help. > Thank you in advance. > > Regards, > Girishkumar R Prajapati > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From girishpati at yahoo.com Thu Sep 9 08:05:23 2010 From: girishpati at yahoo.com (Girish Prajapati) Date: Thu, 9 Sep 2010 01:05:23 -0700 (PDT) Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <1284012028.3342.3.camel@newgen.localdomain> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> <541653.54252.qm@web120512.mail.ne1.yahoo.com> <1284012028.3342.3.camel@newgen.localdomain> Message-ID: <374384.92522.qm@web120505.mail.ne1.yahoo.com> Hello, i can run following command successfully from another node but still getting same error message : fence_ilo -a "Ilo IP"? -l "Ilo user" -p "Ilo passwd" -o reboot Sep? 9 14:37:00 node2 openais[2904]: [CLM? ] Members Joined: Sep? 9 14:37:00 node2 openais[2904]: [SYNC ] This node is within the primary component and will provide service. Sep? 9 14:37:00 node2 openais[2904]: [TOTEM] entering OPERATIONAL state. Sep? 9 14:37:00 node2 openais[2904]: [CLM? ] got nodejoin message 192.168.0.28 Sep? 9 14:37:00 node2 openais[2904]: [CPG? ] got joinlist message from node 1 Sep? 9 14:37:00 node2 fenced[2923]: node1.drctmb.com not a cluster member after 0 sec post_fail_delay Sep? 9 14:37:00 node2 fenced[2923]: fencing node "node1.drctmb.com" Sep? 9 14:37:10 node2 fenced[2923]: agent "fence_ilo" reports: Unable to connect/login to fencing device Sep? 9 14:37:10 node2 fenced[2923]: fence "node1.drctmb.com" failed Sep? 9 14:37:15 node2 fenced[2923]: fencing node "node1.drctmb.com" Sep? 9 14:37:26 node2 fenced[2923]: agent "fence_ilo" reports: Unable to connect/login to fencing device node1 rebooted and get connect to the cluster? but now my webby service not working see below log : Broadcast message from root (Thu Sep? 9 14:32:41 2010): The system is going down for system halt NOW! Sep? 9 14:19:22 node1 last message repeated 17 times Sep? 9 14:32:41 node1 shutdown[25506]: shutting down for system halt Sep? 9 14:32:41 node1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found Sep? 9 14:32:43 node1 modclusterd: shutdown succeeded Sep? 9 14:32:43 node1 rgmanager: [25593]: Shutting down Cluster Service Manager... Sep? 9 14:32:43 node1 clurgmgrd[3457]: Shutting down Sep? 9 14:32:43 node1 clurgmgrd[3457]: Shutting down Sep? 9 14:32:43 node1 clurgmgrd[3457]: Stopping service service:webby Sep? 9 14:32:44 node1 avahi-daemon[3378]: Withdrawing address record for 192.168.0.30 on eth0. Read from remote host node1: Connection reset by peer . . . Sep? 9 14:35:42 node1 smartd[3585]: Device: /dev/hda, packet devices [this device CD/DVD] not SMART capable Sep? 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, opened Sep? 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, IE (SMART) not enabled, skip device Try 'smartctl -s on /dev/sda' to turn on SMART features Sep? 9 14:35:42 node1 smartd[3585]: Monitoring 0 ATA and 0 SCSI devices Sep? 9 14:35:42 node1 smartd[3604]: smartd has fork()ed into background mode. New PID=3604. Sep? 9 14:35:42 node1 avahi-daemon[3412]: Service "SFTP File Transfer on node1" (/services/sftp-ssh.service) successfully established. Sep? 9 14:35:45 node1 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found Sep? 9 14:35:45 node1 last message repeated 3 times Sep? 9 14:35:45 node1 kernel: mtrr: type mismatch for d8000000,2000000 old: uncachable new: write-combining Sep? 9 14:35:46 node1 clurgmgrd: [3491]: Checking Existence Of File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed - File Doesn't Exist It seems that there problem in fencing device configuration. Please find here my cluster.conf : ??????? ??????? ??????????????? ??????????????????????? ??????????????????????????????? ??????????????????????????????????????? ??????????????????????????????? ??????????????????????? ??????????????? ??????????????? ??????????????????????? ??????????????????????????????? ??????????????????????????????????????? ??????????????????????????????? ??????????????????????? ??????????????? ??????? ??????? ??????? ??????????????? ??????????????? ??????? ??????? ??????????????? ??????????????????????? ??????????????????????????????? ??????????????????????????????? ??????????????????????? ??????????????? ??????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????? ??????????????? ??????????????????????? ??????????????????????? ??????????????????????? ??????????????? ??????? ??????? ~? This is first time am working on Clustering so please help me. Appreciate your help. Thank you. ________________________________ From: Brem Belguebli To: linux clustering Sent: Thu, September 9, 2010 11:30:28 AM Subject: Re: [Linux-cluster] need help - Fencing problem try run this from another node of the cluster fence_ilo -a "Ilo IP"? -l "Ilo user" -p "Ilo passwd" -o reboot Additionnally, by connecting thru http to the Ilo, you should be able to see Ilo logs (in the general tab) and see if it is due to a lack of licensing On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote: > Hello... >? > I have already configure BIOS for iLO.. but am not sure why i don need > to shared ?? > please anybody can help me out for this problem. > Do i need any extra setup for fencing device ? > thanks > > > > ______________________________________________________________________ > From: ESGLinux > To: linux clustering > Sent: Wed, September 8, 2010 2:57:25 PM > Subject: Re: [Linux-cluster] need help - Fencing problem > > Hello,? > > > Have you configured the iLO devices entering in the BIOS? > > > I remenber I have to set up the user/pass in the iLO and marked the > iLo as not shared > > > > > HTH, > > > ESG > > 2010/9/8 Girish Prajapati >? ? ? ? Hello Everybody, >? ? ? ? i am having problem of fencing a cluster node? let me explain >? ? ? ? indetail : >? ? ? ? I have installed RHEL 5.4 on? HP Prolaint DL280 G5 servers and >? ? ? ? iLO 2as fencing device. Am managing cluster through Luci - >? ? ? ? (Conga). itseems everything is working fine. I can reboot >? ? ? ? cluster nodes through Luci and service get transfer to another >? ? ? ? node. After rebooting node connect to cluster automatically >? ? ? ? without any error. >? ? ? ? Problem is i can not do Fence this node through Luci, when i >? ? ? ? try to fence any node i get following error : >? ? ? ? >? ? ? ? Sep? 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" >? ? ? ? reports: Unable to connect/login to fencing device >? ? ? ? Sep? 8 14:51:16 node2 fence_node[9106]: Fence of >? ? ? ? "node1.drctmb.com" was unsuccessful >? ? ? ? >? ? ? ? my iLO license is : iLO 2 Advanced Evaluation >? ? ? ? Do i need to have? license of iLO or there is problem in >? ? ? ? configuration of cluster ? >? ? ? ? how i can check cluster log in details. >? ? ? ? ? >? ? ? ? Appreciate your help. >? ? ? ? Thank you in advance. >? ? ? ? ? >? ? ? ? Regards, >? ? ? ? Girishkumar R Prajapati >? ? ? ? >? ? ? ? >? ? ? ? >? ? ? ? -- >? ? ? ? Linux-cluster mailing list >? ? ? ? Linux-cluster at redhat.com >? ? ? ? https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From esggrupos at gmail.com Thu Sep 9 08:51:47 2010 From: esggrupos at gmail.com (ESGLinux) Date: Thu, 9 Sep 2010 10:51:47 +0200 Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <036B68E61A28CA49AC2767596576CD596BACEB2F3D@GVW1113EXC.americas.hpqcorp.net> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> <036B68E61A28CA49AC2767596576CD596BACEB2F3D@GVW1113EXC.americas.hpqcorp.net> Message-ID: Hi, the only reason was that when I used as shared the speed of this device was very very low. Marked it as non-shared it works fine. I don?t know the reason. It was a try-error test, Greetings, ESG 2010/9/9 Jankowski, Chris > Why did you have to set iLO as non-shared? > > Thank and regards, > > Chris > > ------------------------------ > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *ESGLinux > *Sent:* Wednesday, 8 September 2010 22:57 > *To:* linux clustering > > *Subject:* Re: [Linux-cluster] need help - Fencing problem > > Hello, > > Have you configured the iLO devices entering in the BIOS? > > I remenber I have to set up the user/pass in the iLO and marked the iLo as > not shared > > > HTH, > > ESG > > 2010/9/8 Girish Prajapati > >> Hello Everybody, >> i am having problem of fencing a cluster node let me explain indetail : >> I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as >> fencing device. Am managing cluster through Luci - (Conga). itseems >> everything is working fine. I can reboot cluster nodes through Luci and >> service get transfer to another node. After rebooting node connect to >> cluster automatically without any error. >> Problem is i can not do Fence this node through Luci, when i try to fence >> any node i get following error : >> >> Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable >> to connect/login to fencing device >> Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was >> unsuccessful >> >> my iLO license is : iLO 2 Advanced Evaluation >> Do i need to have license of iLO or there is problem in configuration of >> cluster ? >> how i can check cluster log in details. >> >> Appreciate your help. >> Thank you in advance. >> >> Regards, >> Girishkumar R Prajapati >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhurst at bidmc.harvard.edu Thu Sep 9 13:34:20 2010 From: rhurst at bidmc.harvard.edu (rhurst at bidmc.harvard.edu) Date: Thu, 9 Sep 2010 09:34:20 -0400 Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <178789.16151.qm@web120516.mail.ne1.yahoo.com> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> Message-ID: <50168EC934B8D64AA8D8DD37F840F3DE05640628E6@EVS2CCR.its.caregroup.org> For what it is worth, our experiences with HP iLO management cards: iLO found on G1 servers does not need to be licensed, AFAIK, it does not have the option to do so anyways. iLO2 found on G2 and beyond does not need to be licensed either, if you are only using it as a fencing device. We licensed all of ours, because it enabled useful KVM with remote media capabilities that are superior than our Raritan KVM infrastructure. Both management cards should have their firmware updated -- they were both problematic to us as factory-shipped, but applying their update packs allowed them to work as advertised. Also, can't you add "-v" for verbose output and also something like "-D /tmp/fence.out" to save debugging info to an output file? It might help some to see where exactly the failure is occuring. Good luck. ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Girish Prajapati Sent: Wednesday, September 08, 2010 6:06 AM To: Linux-cluster at redhat.com Subject: [Linux-cluster] need help - Fencing problem Hello Everybody, i am having problem of fencing a cluster node let me explain indetail : I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as fencing device. Am managing cluster through Luci - (Conga). itseems everything is working fine. I can reboot cluster nodes through Luci and service get transfer to another node. After rebooting node connect to cluster automatically without any error. Problem is i can not do Fence this node through Luci, when i try to fence any node i get following error : Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was unsuccessful my iLO license is : iLO 2 Advanced Evaluation Do i need to have license of iLO or there is problem in configuration of cluster ? how i can check cluster log in details. Appreciate your help. Thank you in advance. Regards, Girishkumar R Prajapati -------------- next part -------------- An HTML attachment was scrubbed... URL: From nehemiasjahcob at gmail.com Thu Sep 9 14:18:31 2010 From: nehemiasjahcob at gmail.com (Nehemias Jahcob) Date: Thu, 9 Sep 2010 10:18:31 -0400 Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <50168EC934B8D64AA8D8DD37F840F3DE05640628E6@EVS2CCR.its.caregroup.org> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> <50168EC934B8D64AA8D8DD37F840F3DE05640628E6@EVS2CCR.its.caregroup.org> Message-ID: 1. ) You can increase the verbosity level for troubleshooting?? ---- * ----- #ccs_tool update /etc/cluster/cluster.conf Copy-paste /var/log/messages 2.) What version of PSP you have installed?? 3.) If nothing works, I recommend using fence_ipmi Greetings! 2010/9/9 > For what it is worth, our experiences with HP iLO management cards: > > iLO found on G1 servers does not need to be licensed, AFAIK, it does not > have the option to do so anyways. > > iLO2 found on G2 and beyond does not need to be licensed either, if you are > only using it as a fencing device. We licensed all of ours, because it > enabled useful KVM with remote media capabilities that are superior than our > Raritan KVM infrastructure. > > Both management cards should have their firmware updated -- they were both > problematic to us as factory-shipped, but applying their update > packs allowed them to work as advertised. > > Also, can't you add "-v" for verbose output and also something like "-D > /tmp/fence.out" to save debugging info to an output file? It might help > some to see where exactly the failure is occuring. Good luck. > > ------------------------------ > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *Girish Prajapati > *Sent:* Wednesday, September 08, 2010 6:06 AM > *To:* Linux-cluster at redhat.com > *Subject:* [Linux-cluster] need help - Fencing problem > > Hello Everybody, > i am having problem of fencing a cluster node let me explain indetail : > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as > fencing device. Am managing cluster through Luci - (Conga). itseems > everything is working fine. I can reboot cluster nodes through Luci and > service get transfer to another node. After rebooting node connect to > cluster automatically without any error. > Problem is i can not do Fence this node through Luci, when i try to fence > any node i get following error : > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was > unsuccessful > > my iLO license is : iLO 2 Advanced Evaluation > Do i need to have license of iLO or there is problem in configuration of > cluster ? > how i can check cluster log in details. > > Appreciate your help. > Thank you in advance. > > Regards, > Girishkumar R Prajapati > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bturner at redhat.com Thu Sep 9 15:58:45 2010 From: bturner at redhat.com (Ben Turner) Date: Thu, 9 Sep 2010 11:58:45 -0400 (EDT) Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <374384.92522.qm@web120505.mail.ne1.yahoo.com> Message-ID: <155361964.174311284047925612.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Judging from: "Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device" Chances are you are not using the correct username/password/IP or the ilo is not configured for telnet logins. Try the following: 1. Login to the ilo via telnet from the command line. Be sure to use the username/password/IP you have in cluster.conf. 2. If that is successful try: # fence_ilo -v -a "Ilo IP from cluster.conf" -l "Ilo user from cluster.conf" -p "Ilo passwd from cluster.conf" -o status The -v will display exactly what the fence agent sees and is very useful for debugging failing fences. If the status fails send me the output. 3. If the fence_ilo successful try: # fence_node If all 3 are successful then fencing is setup properly and there may be a problem running it from Luci, if any of the 3 fail post the error back to the list and I'll look at it. -Ben ----- "Girish Prajapati" wrote: > Hello, > i can run following command successfully from another node but still > getting same error message : > > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot > > Sep 9 14:37:00 node2 openais[2904]: [CLM ] Members Joined: > Sep 9 14:37:00 node2 openais[2904]: [SYNC ] This node is within the > primary component and will provide service. > Sep 9 14:37:00 node2 openais[2904]: [TOTEM] entering OPERATIONAL > state. > Sep 9 14:37:00 node2 openais[2904]: [CLM ] got nodejoin message > 192.168.0.28 > Sep 9 14:37:00 node2 openais[2904]: [CPG ] got joinlist message from > node 1 > Sep 9 14:37:00 node2 fenced[2923]: node1.drctmb.com not a cluster > member after 0 sec post_fail_delay > Sep 9 14:37:00 node2 fenced[2923]: fencing node "node1.drctmb.com" > Sep 9 14:37:10 node2 fenced[2923]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 9 14:37:10 node2 fenced[2923]: fence "node1.drctmb.com" failed > Sep 9 14:37:15 node2 fenced[2923]: fencing node "node1.drctmb.com" > Sep 9 14:37:26 node2 fenced[2923]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > > node1 rebooted and get connect to the cluster but now my webby service > not working see below log : > > Broadcast message from root (Thu Sep 9 14:32:41 2010): > The system is going down for system halt NOW! > Sep 9 14:19:22 node1 last message repeated 17 times > Sep 9 14:32:41 node1 shutdown[25506]: shutting down for system halt > Sep 9 14:32:41 node1 pcscd: winscard.c:304:SCardConnect() Reader > E-Gate 0 0 Not Found > Sep 9 14:32:43 node1 modclusterd: shutdown succeeded > Sep 9 14:32:43 node1 rgmanager: [25593]: Shutting down > Cluster Service Manager... > Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down > Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down > Sep 9 14:32:43 node1 clurgmgrd[3457]: Stopping service > service:webby > Sep 9 14:32:44 node1 avahi-daemon[3378]: Withdrawing address record > for 192.168.0.30 on eth0. > Read from remote host node1: Connection reset by peer > . > . > . > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/hda, packet devices > [this device CD/DVD] not SMART capable > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, opened > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, IE (SMART) not > enabled, skip device Try 'smartctl -s on /dev/sda' to turn on SMART > features > Sep 9 14:35:42 node1 smartd[3585]: Monitoring 0 ATA and 0 SCSI devices > Sep 9 14:35:42 node1 smartd[3604]: smartd has fork()ed into background > mode. New PID=3604. > Sep 9 14:35:42 node1 avahi-daemon[3412]: Service "SFTP File Transfer > on node1" (/services/sftp-ssh.service) successfully established. > Sep 9 14:35:45 node1 pcscd: winscard.c:304:SCardConnect() Reader > E-Gate 0 0 Not Found > Sep 9 14:35:45 node1 last message repeated 3 times > Sep 9 14:35:45 node1 kernel: mtrr: type mismatch for d8000000,2000000 > old: uncachable new: write-combining > Sep 9 14:35:46 node1 clurgmgrd: [3491]: Checking Existence Of > File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed > - File Doesn't Exist > > > > It seems that there problem in fencing device configuration. > Please find here my cluster.conf : > > > > > post_join_delay="3"/> > > > > > > > > > > > > > > > > > > > login="root" name="NODE1" passwd="redhat123"/> > login="root" name="NODE2" passwd="redhat123"/> > > > > restricted="1"> > > > > > > fstype="ext3" mountpoint="/var/www/html" name="docroot" > self_fence="0"/> > > server_root="/etc/httpd" shutdown_wait="5"/> > > name="webby" recovery="relocate"> > > > > > > > > ~ > > This is first time am working on Clustering so please help me. > Appreciate your help. > > Thank you. > > > > From: Brem Belguebli > To: linux clustering > Sent: Thu, September 9, 2010 11:30:28 AM > Subject: Re: [Linux-cluster] need help - Fencing problem > > try run this from another node of the cluster > > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot > > > Additionnally, by connecting thru http to the Ilo, you should be able > to > see Ilo logs (in the general tab) and see if it is due to a lack of > licensing > > > On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote: > > Hello... > > > > I have already configure BIOS for iLO.. but am not sure why i don > need > > to shared ?? > > please anybody can help me out for this problem. > > Do i need any extra setup for fencing device ? > > thanks > > > > > > > > > ______________________________________________________________________ > > From: ESGLinux < esggrupos at gmail.com > > > To: linux clustering < linux-cluster at redhat.com > > > Sent: Wed, September 8, 2010 2:57:25 PM > > Subject: Re: [Linux-cluster] need help - Fencing problem > > > > Hello, > > > > > > Have you configured the iLO devices entering in the BIOS? > > > > > > I remenber I have to set up the user/pass in the iLO and marked the > > iLo as not shared > > > > > > > > > > HTH, > > > > > > ESG > > > > 2010/9/8 Girish Prajapati < girishpati at yahoo.com > > > Hello Everybody, > > i am having problem of fencing a cluster node let me explain > > indetail : > > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and > > iLO 2as fencing device. Am managing cluster through Luci - > > (Conga). itseems everything is working fine. I can reboot > > cluster nodes through Luci and service get transfer to another > > node. After rebooting node connect to cluster automatically > > without any error. > > Problem is i can not do Fence this node through Luci, when i > > try to fence any node i get following error : > > > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" > > reports: Unable to connect/login to fencing device > > Sep 8 14:51:16 node2 fence_node[9106]: Fence of > > " node1.drctmb.com " was unsuccessful > > > > my iLO license is : iLO 2 Advanced Evaluation > > Do i need to have license of iLO or there is problem in > > configuration of cluster ? > > how i can check cluster log in details. > > > > Appreciate your help. > > Thank you in advance. > > > > Regards, > > Girishkumar R Prajapati > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From sagar.vipin at gmail.com Thu Sep 9 15:43:09 2010 From: sagar.vipin at gmail.com (vipin sagar) Date: Thu, 9 Sep 2010 21:13:09 +0530 Subject: [Linux-cluster] RHCS: High Availabilty on SAP Application Message-ID: Hello There! I am sure, quite a lot of people in here have worked on different types of Cluster setup. I myself worked on setting up ROCKS and MPICH on the HPC side. Now I am looking for a head start on setting up an HA cluster for "*SAP application"*, which includes ABAP and ABAP+JAVA application stack with MaxDB on RHAS-5.5. If any of you worked with SAP on RHCS HA set up, please share your thoughts, inputs, best-practice or any kind of reference would be much grateful. Already read www.*redhat*.com/f/pdf/ha-*sap*-v1-6-4.pdf Thank you for your time ~sagar -- ~O_0~ ~sagar http://vipinsagar.net *...i?ve to look back when i heard a gong! i could only see a huge cobweb and its shining, just got wonder, what the time it was?5AgAr* -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Thu Sep 9 17:59:05 2010 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 09 Sep 2010 13:59:05 -0400 Subject: [Linux-cluster] Creating custom OCF_RESKEY_ variables, now how and what to refresh? In-Reply-To: <4C581396.2020107@gmail.com> References: <4C581396.2020107@gmail.com> Message-ID: <1284055145.2207.16059.camel@ayanami.boston.devel.redhat.com> On Tue, 2010-08-03 at 08:03 -0500, Dustin Henry Offutt wrote: > Still an unsolved mystery why new "rules" written into one of the > cluster scripts located in /usr/share/cluster on an RHEL5U5 cluster > won't get recognized by the cluster software, if anyone has a clue... > > Hello, > > > > Does anyone know how to force a cluster (the "Cluster Suite" as > > released with the RHEL5.4 ISO, cman, rgmanager, et.al.) to recognize > > that new OCF_RESKEY variables have been introduced in > > a /usr/share/cluster/ script? > > On one cluster the new variables are recognized and used by all > > nodes. Same exact script, another cluster, just put the > > updated /usr/share/cluster/ script in today, and it's like it hasn't > > had something "refreshed", and doesn't see the new "rules," if that > > makes any sense - despite bouncing the cluster suite and the cluster > > nodes. - needs to be mode 755 - update /etc/cluster/cluster.conf's version and run ccs_tool update /etc/cluster/cluster.conf cman_tool version -r - remove all backup files from /usr/share/cluster -OR- chmod -x them. Sorry for the late response. :/ -- Lon From lhh at redhat.com Thu Sep 9 18:03:19 2010 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 09 Sep 2010 14:03:19 -0400 Subject: [Linux-cluster] What does FAIL_STOP_WAIT state mean for clvmd and rgmanager In-Reply-To: References: Message-ID: <1284055399.2207.16065.camel@ayanami.boston.devel.redhat.com> On Mon, 2010-08-23 at 17:58 +1000, Joel Heenan wrote: > Can someone please explain what this means and what you can do to get > out of it: > > [root at cluster-host ~]# group_tool -v > type level name id state node id local_done > fence 0 default 00010003 JOIN_STOP_WAIT 1 100050001 > 1 > [1 1 2 3 4] > dlm 1 clvmd 00020003 FAIL_STOP_WAIT 2 200030003 > 1 > [1 2 3 4] > dlm 1 rgmanager 00030003 FAIL_STOP_WAIT 2 200030003 > 1 > [1 2 3 4] It looks like fencing has not completed. How do you have 2 node 1's in the fencing group? -- Lon From lhh at redhat.com Thu Sep 9 18:06:22 2010 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 09 Sep 2010 14:06:22 -0400 Subject: [Linux-cluster] resource script vm.sh strange declare directive In-Reply-To: References: Message-ID: <1284055582.2207.16071.camel@ayanami.boston.devel.redhat.com> On Tue, 2010-08-24 at 17:57 +0200, brem belguebli wrote: > Hi, > > After not being able to live migrate cluster resource vm's from one > node to the other using clusvcadm, I've put some debug in vm.sh and it > allowed me to see a strange variable assignement that I do not > understand and that prevents live migration to occur. > > Rhel 5.5 /usr/share/cluster/vm.sh at line 790 > virsh_migrate() > declare $target=$1 <-- strange > > Rhel 5.4 in /usr/share/cluster/vm.sh at line 631 > virsh_migrate() > declare $target=$1 <-- Same declaration > > For information, when removing the $ before target, live migration > works like a charm. It should work either way. That variable assignment doesn't actually matter because of the way bash works. The higher up function which calls virsh_migrate function declares $target (correctly) and passes it in as $1 to virsh_migrate. Because $target's scope is actually global (declare does not create a 'local' variable; it creates a global one; the 'local' keyword creates a 'local' variable), the fact that there is a syntax error should not matter in this case. So, you'll get a weird error if you run this from the console but it should not affect migration. -- Lon From lhh at redhat.com Thu Sep 9 18:10:02 2010 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 09 Sep 2010 14:10:02 -0400 Subject: [Linux-cluster] active/active NFS cluster question In-Reply-To: References: Message-ID: <1284055802.2207.16078.camel@ayanami.boston.devel.redhat.com> On Wed, 2010-09-01 at 16:40 -0400, Chris Walker wrote: > Hello, > > I suspect that I'm doing something fairly stupid, but I'm having a > problem with a cluster that is exporting the same GFS filesystems to > the same nfs clients. Everything thing is fine until I relocate one > of the nfs services to another machine. The relocation goes fine, but > once I have two nfs services on the same machine, I can't get them > apart. When I relocate one of the two nfs services to a different > cluster host, the relocation wipes out the entries in > /var/lib/nfs/etab, forcing the second nfs service on that node to > relocate as well (I get the error "nfsclient:rc_nfs_clients is > missing!"). Please delete the name=" " from the nfsclient lines; it might be a bug in luci, but you can't have "name" and "ref" in the same line. It will probably cause rgmanager to think the entire entry is missing in the best case. > > ref="rc_nfs_clients"/> > ^^^^^ -- Lon From brem.belguebli at gmail.com Thu Sep 9 18:37:29 2010 From: brem.belguebli at gmail.com (brem belguebli) Date: Thu, 9 Sep 2010 20:37:29 +0200 Subject: [Linux-cluster] resource script vm.sh strange declare directive In-Reply-To: <1284055582.2207.16071.camel@ayanami.boston.devel.redhat.com> References: <1284055582.2207.16071.camel@ayanami.boston.devel.redhat.com> Message-ID: Hi Lon, It did affect live migration. Once corrected migration worked like a charm. On my FC13 box, the script is "correct" Brem 2010/9/9 Lon Hohberger : > On Tue, 2010-08-24 at 17:57 +0200, brem belguebli wrote: >> Hi, >> >> After not being able to live migrate cluster resource vm's from one >> node to the other using clusvcadm, I've put some debug in vm.sh and it >> allowed me to see a strange variable assignement that I do not >> understand and that prevents live migration to occur. >> >> Rhel 5.5 /usr/share/cluster/vm.sh at line ?790 >> virsh_migrate() >> ? ? ? ? ? ? ? declare $target=$1 <-- strange >> >> Rhel 5.4 in /usr/share/cluster/vm.sh at line 631 >> virsh_migrate() >> ? ? ? ? ? ? ? declare $target=$1 <-- Same declaration >> >> For information, when removing the $ before target, live migration >> works like a charm. > > It should work either way. ?That variable assignment doesn't actually > matter because of the way bash works. ?The higher up function which > calls virsh_migrate function declares $target (correctly) and passes it > in as $1 to virsh_migrate. > > Because $target's scope is actually global (declare does not create a > 'local' variable; it creates a global one; the 'local' keyword creates a > 'local' variable), the fact that there is a syntax error should not > matter in this case. > > So, you'll get a weird error if you run this from the console but it > should not affect migration. > > -- Lon > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From anoop_rajkumar at merck.com Thu Sep 9 20:12:25 2010 From: anoop_rajkumar at merck.com (Rajkumar, Anoop) Date: Thu, 9 Sep 2010 16:12:25 -0400 Subject: [Linux-cluster] Linux-cluster Digest, Vol 77, Issue 5 In-Reply-To: References: Message-ID: Hi It seems you are using hostname of cluster nodes at the place of hostname of ilo (ILO should have separate ip and hostname in DNS) In below config is node1.drctmb.com assigned as hostname of node or the hostname of ILO device? It should be hostname of ilo device.. > login="root" name="NODE1" passwd="redhat123"/> > login="root" name="NODE2" passwd="redhat123"/> > Thanks Anoop -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of linux-cluster-request at redhat.com Sent: Thursday, September 09, 2010 12:00 PM To: linux-cluster at redhat.com Subject: Linux-cluster Digest, Vol 77, Issue 5 Send Linux-cluster mailing list submissions to linux-cluster at redhat.com To subscribe or unsubscribe via the World Wide Web, visit https://www.redhat.com/mailman/listinfo/linux-cluster or, via email, send a message with subject or body 'help' to linux-cluster-request at redhat.com You can reach the person managing the list at linux-cluster-owner at redhat.com When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-cluster digest..." Today's Topics: 1. Re: need help - Fencing problem (ESGLinux) 2. Re: need help - Fencing problem (rhurst at bidmc.harvard.edu) 3. Re: need help - Fencing problem (Nehemias Jahcob) 4. Re: need help - Fencing problem (Ben Turner) ---------------------------------------------------------------------- Message: 1 Date: Thu, 9 Sep 2010 10:51:47 +0200 From: ESGLinux To: linux clustering Subject: Re: [Linux-cluster] need help - Fencing problem Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi, the only reason was that when I used as shared the speed of this device was very very low. Marked it as non-shared it works fine. I don?t know the reason. It was a try-error test, Greetings, ESG 2010/9/9 Jankowski, Chris > Why did you have to set iLO as non-shared? > > Thank and regards, > > Chris > > ------------------------------ > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *ESGLinux > *Sent:* Wednesday, 8 September 2010 22:57 > *To:* linux clustering > > *Subject:* Re: [Linux-cluster] need help - Fencing problem > > Hello, > > Have you configured the iLO devices entering in the BIOS? > > I remenber I have to set up the user/pass in the iLO and marked the iLo as > not shared > > > HTH, > > ESG > > 2010/9/8 Girish Prajapati > >> Hello Everybody, >> i am having problem of fencing a cluster node let me explain indetail : >> I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as >> fencing device. Am managing cluster through Luci - (Conga). itseems >> everything is working fine. I can reboot cluster nodes through Luci and >> service get transfer to another node. After rebooting node connect to >> cluster automatically without any error. >> Problem is i can not do Fence this node through Luci, when i try to fence >> any node i get following error : >> >> Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable >> to connect/login to fencing device >> Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was >> unsuccessful >> >> my iLO license is : iLO 2 Advanced Evaluation >> Do i need to have license of iLO or there is problem in configuration of >> cluster ? >> how i can check cluster log in details. >> >> Appreciate your help. >> Thank you in advance. >> >> Regards, >> Girishkumar R Prajapati >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Thu, 9 Sep 2010 09:34:20 -0400 From: To: Subject: Re: [Linux-cluster] need help - Fencing problem Message-ID: <50168EC934B8D64AA8D8DD37F840F3DE05640628E6 at EVS2CCR.its.caregroup.org> Content-Type: text/plain; charset="us-ascii" For what it is worth, our experiences with HP iLO management cards: iLO found on G1 servers does not need to be licensed, AFAIK, it does not have the option to do so anyways. iLO2 found on G2 and beyond does not need to be licensed either, if you are only using it as a fencing device. We licensed all of ours, because it enabled useful KVM with remote media capabilities that are superior than our Raritan KVM infrastructure. Both management cards should have their firmware updated -- they were both problematic to us as factory-shipped, but applying their update packs allowed them to work as advertised. Also, can't you add "-v" for verbose output and also something like "-D /tmp/fence.out" to save debugging info to an output file? It might help some to see where exactly the failure is occuring. Good luck. ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Girish Prajapati Sent: Wednesday, September 08, 2010 6:06 AM To: Linux-cluster at redhat.com Subject: [Linux-cluster] need help - Fencing problem Hello Everybody, i am having problem of fencing a cluster node let me explain indetail : I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as fencing device. Am managing cluster through Luci - (Conga). itseems everything is working fine. I can reboot cluster nodes through Luci and service get transfer to another node. After rebooting node connect to cluster automatically without any error. Problem is i can not do Fence this node through Luci, when i try to fence any node i get following error : Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was unsuccessful my iLO license is : iLO 2 Advanced Evaluation Do i need to have license of iLO or there is problem in configuration of cluster ? how i can check cluster log in details. Appreciate your help. Thank you in advance. Regards, Girishkumar R Prajapati -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Thu, 9 Sep 2010 10:18:31 -0400 From: Nehemias Jahcob To: linux clustering Subject: Re: [Linux-cluster] need help - Fencing problem Message-ID: Content-Type: text/plain; charset="iso-8859-1" 1. ) You can increase the verbosity level for troubleshooting?? ---- * ----- #ccs_tool update /etc/cluster/cluster.conf Copy-paste /var/log/messages 2.) What version of PSP you have installed?? 3.) If nothing works, I recommend using fence_ipmi Greetings! 2010/9/9 > For what it is worth, our experiences with HP iLO management cards: > > iLO found on G1 servers does not need to be licensed, AFAIK, it does not > have the option to do so anyways. > > iLO2 found on G2 and beyond does not need to be licensed either, if you are > only using it as a fencing device. We licensed all of ours, because it > enabled useful KVM with remote media capabilities that are superior than our > Raritan KVM infrastructure. > > Both management cards should have their firmware updated -- they were both > problematic to us as factory-shipped, but applying their update > packs allowed them to work as advertised. > > Also, can't you add "-v" for verbose output and also something like "-D > /tmp/fence.out" to save debugging info to an output file? It might help > some to see where exactly the failure is occuring. Good luck. > > ------------------------------ > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *Girish Prajapati > *Sent:* Wednesday, September 08, 2010 6:06 AM > *To:* Linux-cluster at redhat.com > *Subject:* [Linux-cluster] need help - Fencing problem > > Hello Everybody, > i am having problem of fencing a cluster node let me explain indetail : > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as > fencing device. Am managing cluster through Luci - (Conga). itseems > everything is working fine. I can reboot cluster nodes through Luci and > service get transfer to another node. After rebooting node connect to > cluster automatically without any error. > Problem is i can not do Fence this node through Luci, when i try to fence > any node i get following error : > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was > unsuccessful > > my iLO license is : iLO 2 Advanced Evaluation > Do i need to have license of iLO or there is problem in configuration of > cluster ? > how i can check cluster log in details. > > Appreciate your help. > Thank you in advance. > > Regards, > Girishkumar R Prajapati > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 4 Date: Thu, 9 Sep 2010 11:58:45 -0400 (EDT) From: Ben Turner To: linux clustering Subject: Re: [Linux-cluster] need help - Fencing problem Message-ID: <155361964.174311284047925612.JavaMail.root at zmail07.collab.prod.int.phx2 .redhat.com> Content-Type: text/plain; charset=utf-8 Judging from: "Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device" Chances are you are not using the correct username/password/IP or the ilo is not configured for telnet logins. Try the following: 1. Login to the ilo via telnet from the command line. Be sure to use the username/password/IP you have in cluster.conf. 2. If that is successful try: # fence_ilo -v -a "Ilo IP from cluster.conf" -l "Ilo user from cluster.conf" -p "Ilo passwd from cluster.conf" -o status The -v will display exactly what the fence agent sees and is very useful for debugging failing fences. If the status fails send me the output. 3. If the fence_ilo successful try: # fence_node If all 3 are successful then fencing is setup properly and there may be a problem running it from Luci, if any of the 3 fail post the error back to the list and I'll look at it. -Ben ----- "Girish Prajapati" wrote: > Hello, > i can run following command successfully from another node but still > getting same error message : > > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot > > Sep 9 14:37:00 node2 openais[2904]: [CLM ] Members Joined: > Sep 9 14:37:00 node2 openais[2904]: [SYNC ] This node is within the > primary component and will provide service. > Sep 9 14:37:00 node2 openais[2904]: [TOTEM] entering OPERATIONAL > state. > Sep 9 14:37:00 node2 openais[2904]: [CLM ] got nodejoin message > 192.168.0.28 > Sep 9 14:37:00 node2 openais[2904]: [CPG ] got joinlist message from > node 1 > Sep 9 14:37:00 node2 fenced[2923]: node1.drctmb.com not a cluster > member after 0 sec post_fail_delay > Sep 9 14:37:00 node2 fenced[2923]: fencing node "node1.drctmb.com" > Sep 9 14:37:10 node2 fenced[2923]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 9 14:37:10 node2 fenced[2923]: fence "node1.drctmb.com" failed > Sep 9 14:37:15 node2 fenced[2923]: fencing node "node1.drctmb.com" > Sep 9 14:37:26 node2 fenced[2923]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > > node1 rebooted and get connect to the cluster but now my webby service > not working see below log : > > Broadcast message from root (Thu Sep 9 14:32:41 2010): > The system is going down for system halt NOW! > Sep 9 14:19:22 node1 last message repeated 17 times > Sep 9 14:32:41 node1 shutdown[25506]: shutting down for system halt > Sep 9 14:32:41 node1 pcscd: winscard.c:304:SCardConnect() Reader > E-Gate 0 0 Not Found > Sep 9 14:32:43 node1 modclusterd: shutdown succeeded > Sep 9 14:32:43 node1 rgmanager: [25593]: Shutting down > Cluster Service Manager... > Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down > Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down > Sep 9 14:32:43 node1 clurgmgrd[3457]: Stopping service > service:webby > Sep 9 14:32:44 node1 avahi-daemon[3378]: Withdrawing address record > for 192.168.0.30 on eth0. > Read from remote host node1: Connection reset by peer > . > . > . > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/hda, packet devices > [this device CD/DVD] not SMART capable > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, opened > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, IE (SMART) not > enabled, skip device Try 'smartctl -s on /dev/sda' to turn on SMART > features > Sep 9 14:35:42 node1 smartd[3585]: Monitoring 0 ATA and 0 SCSI devices > Sep 9 14:35:42 node1 smartd[3604]: smartd has fork()ed into background > mode. New PID=3604. > Sep 9 14:35:42 node1 avahi-daemon[3412]: Service "SFTP File Transfer > on node1" (/services/sftp-ssh.service) successfully established. > Sep 9 14:35:45 node1 pcscd: winscard.c:304:SCardConnect() Reader > E-Gate 0 0 Not Found > Sep 9 14:35:45 node1 last message repeated 3 times > Sep 9 14:35:45 node1 kernel: mtrr: type mismatch for d8000000,2000000 > old: uncachable new: write-combining > Sep 9 14:35:46 node1 clurgmgrd: [3491]: Checking Existence Of > File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed > - File Doesn't Exist > > > > It seems that there problem in fencing device configuration. > Please find here my cluster.conf : > > > > > post_join_delay="3"/> > > > > > > > > > > > > > > > > > > > login="root" name="NODE1" passwd="redhat123"/> > login="root" name="NODE2" passwd="redhat123"/> > > > > restricted="1"> > > > > > > fstype="ext3" mountpoint="/var/www/html" name="docroot" > self_fence="0"/> > > server_root="/etc/httpd" shutdown_wait="5"/> > > name="webby" recovery="relocate"> > > > > > > > > ~ > > This is first time am working on Clustering so please help me. > Appreciate your help. > > Thank you. > > > > From: Brem Belguebli > To: linux clustering > Sent: Thu, September 9, 2010 11:30:28 AM > Subject: Re: [Linux-cluster] need help - Fencing problem > > try run this from another node of the cluster > > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot > > > Additionnally, by connecting thru http to the Ilo, you should be able > to > see Ilo logs (in the general tab) and see if it is due to a lack of > licensing > > > On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote: > > Hello... > > > > I have already configure BIOS for iLO.. but am not sure why i don > need > > to shared ?? > > please anybody can help me out for this problem. > > Do i need any extra setup for fencing device ? > > thanks > > > > > > > > > ______________________________________________________________________ > > From: ESGLinux < esggrupos at gmail.com > > > To: linux clustering < linux-cluster at redhat.com > > > Sent: Wed, September 8, 2010 2:57:25 PM > > Subject: Re: [Linux-cluster] need help - Fencing problem > > > > Hello, > > > > > > Have you configured the iLO devices entering in the BIOS? > > > > > > I remenber I have to set up the user/pass in the iLO and marked the > > iLo as not shared > > > > > > > > > > HTH, > > > > > > ESG > > > > 2010/9/8 Girish Prajapati < girishpati at yahoo.com > > > Hello Everybody, > > i am having problem of fencing a cluster node let me explain > > indetail : > > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and > > iLO 2as fencing device. Am managing cluster through Luci - > > (Conga). itseems everything is working fine. I can reboot > > cluster nodes through Luci and service get transfer to another > > node. After rebooting node connect to cluster automatically > > without any error. > > Problem is i can not do Fence this node through Luci, when i > > try to fence any node i get following error : > > > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" > > reports: Unable to connect/login to fencing device > > Sep 8 14:51:16 node2 fence_node[9106]: Fence of > > " node1.drctmb.com " was unsuccessful > > > > my iLO license is : iLO 2 Advanced Evaluation > > Do i need to have license of iLO or there is problem in > > configuration of cluster ? > > how i can check cluster log in details. > > > > Appreciate your help. > > Thank you in advance. > > > > Regards, > > Girishkumar R Prajapati > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster ------------------------------ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster End of Linux-cluster Digest, Vol 77, Issue 5 ******************************************** Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. From brem.belguebli at gmail.com Fri Sep 10 08:14:17 2010 From: brem.belguebli at gmail.com (Brem Belguebli) Date: Fri, 10 Sep 2010 10:14:17 +0200 Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <744023.36408.qm@web120512.mail.ne1.yahoo.com> References: <178789.16151.qm@web120516.mail.ne1.yahoo.com> <541653.54252.qm@web120512.mail.ne1.yahoo.com> <1284012028.3342.3.camel@newgen.localdomain> <744023.36408.qm@web120512.mail.ne1.yahoo.com> Message-ID: <1284106457.3342.5.camel@newgen.localdomain> hostname filed in agent fence_ilo line must be the hostname (or IP addr) of the ILO not the one of the node. Regards > On Thu, 2010-09-09 at 00:43 -0700, Girish Prajapati wrote: > > > om> > To: > linux clustering > > Subject: > Re: [Linux-cluster] need help - > Fencing problem > Date: > Thu, 9 Sep 2010 00:43:45 -0700 > (PDT) (09/09/2010 09:43:45 AM) > > > Hello, > i can run following command successfully from another node but still > getting same error message : From girishpati at yahoo.com Fri Sep 10 09:38:05 2010 From: girishpati at yahoo.com (Girish Prajapati) Date: Fri, 10 Sep 2010 02:38:05 -0700 (PDT) Subject: [Linux-cluster] Linux-cluster Digest, Vol 77, Issue 5 In-Reply-To: References: Message-ID: <449355.23468.qm@web120508.mail.ne1.yahoo.com> Hello Mr. Anoop, I have already try with different host & ilo name but am getting same getting same error. Please let me know if there is any other possibility for troubleshoot. Thank you. Regards, Girishkumar R Prajapati ________________________________ From: "Rajkumar, Anoop" To: linux-cluster at redhat.com Sent: Fri, September 10, 2010 1:42:25 AM Subject: Re: [Linux-cluster] Linux-cluster Digest, Vol 77, Issue 5 Hi It seems you are using hostname of cluster nodes at the place of hostname of ilo (ILO should have separate ip and hostname in DNS) In below config is node1.drctmb.com assigned as hostname of node or the hostname of ILO device? It should be hostname of ilo device.. > login="root" name="NODE1" passwd="redhat123"/> > login="root" name="NODE2" passwd="redhat123"/> > Thanks Anoop -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of linux-cluster-request at redhat.com Sent: Thursday, September 09, 2010 12:00 PM To: linux-cluster at redhat.com Subject: Linux-cluster Digest, Vol 77, Issue 5 Send Linux-cluster mailing list submissions to linux-cluster at redhat.com To subscribe or unsubscribe via the World Wide Web, visit https://www.redhat.com/mailman/listinfo/linux-cluster or, via email, send a message with subject or body 'help' to linux-cluster-request at redhat.com You can reach the person managing the list at linux-cluster-owner at redhat.com When replying, please edit your Subject line so it is more specific than "Re: Contents of Linux-cluster digest..." Today's Topics: 1. Re: need help - Fencing problem (ESGLinux) 2. Re: need help - Fencing problem (rhurst at bidmc.harvard.edu) 3. Re: need help - Fencing problem (Nehemias Jahcob) 4. Re: need help - Fencing problem (Ben Turner) ---------------------------------------------------------------------- Message: 1 Date: Thu, 9 Sep 2010 10:51:47 +0200 From: ESGLinux To: linux clustering Subject: Re: [Linux-cluster] need help - Fencing problem Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi, the only reason was that when I used as shared the speed of this device was very very low. Marked it as non-shared it works fine. I don?t know the reason. It was a try-error test, Greetings, ESG 2010/9/9 Jankowski, Chris > Why did you have to set iLO as non-shared? > > Thank and regards, > > Chris > > ------------------------------ > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *ESGLinux > *Sent:* Wednesday, 8 September 2010 22:57 > *To:* linux clustering > > *Subject:* Re: [Linux-cluster] need help - Fencing problem > > Hello, > > Have you configured the iLO devices entering in the BIOS? > > I remenber I have to set up the user/pass in the iLO and marked the iLo as > not shared > > > HTH, > > ESG > > 2010/9/8 Girish Prajapati > >> Hello Everybody, >> i am having problem of fencing a cluster node let me explain indetail : >> I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as >> fencing device. Am managing cluster through Luci - (Conga). itseems >> everything is working fine. I can reboot cluster nodes through Luci and >> service get transfer to another node. After rebooting node connect to >> cluster automatically without any error. >> Problem is i can not do Fence this node through Luci, when i try to fence >> any node i get following error : >> >> Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable >> to connect/login to fencing device >> Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was >> unsuccessful >> >> my iLO license is : iLO 2 Advanced Evaluation >> Do i need to have license of iLO or there is problem in configuration of >> cluster ? >> how i can check cluster log in details. >> >> Appreciate your help. >> Thank you in advance. >> >> Regards, >> Girishkumar R Prajapati >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Thu, 9 Sep 2010 09:34:20 -0400 From: To: Subject: Re: [Linux-cluster] need help - Fencing problem Message-ID: <50168EC934B8D64AA8D8DD37F840F3DE05640628E6 at EVS2CCR.its.caregroup.org> Content-Type: text/plain; charset="us-ascii" For what it is worth, our experiences with HP iLO management cards: iLO found on G1 servers does not need to be licensed, AFAIK, it does not have the option to do so anyways. iLO2 found on G2 and beyond does not need to be licensed either, if you are only using it as a fencing device. We licensed all of ours, because it enabled useful KVM with remote media capabilities that are superior than our Raritan KVM infrastructure. Both management cards should have their firmware updated -- they were both problematic to us as factory-shipped, but applying their update packs allowed them to work as advertised. Also, can't you add "-v" for verbose output and also something like "-D /tmp/fence.out" to save debugging info to an output file? It might help some to see where exactly the failure is occuring. Good luck. ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Girish Prajapati Sent: Wednesday, September 08, 2010 6:06 AM To: Linux-cluster at redhat.com Subject: [Linux-cluster] need help - Fencing problem Hello Everybody, i am having problem of fencing a cluster node let me explain indetail : I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as fencing device. Am managing cluster through Luci - (Conga). itseems everything is working fine. I can reboot cluster nodes through Luci and service get transfer to another node. After rebooting node connect to cluster automatically without any error. Problem is i can not do Fence this node through Luci, when i try to fence any node i get following error : Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was unsuccessful my iLO license is : iLO 2 Advanced Evaluation Do i need to have license of iLO or there is problem in configuration of cluster ? how i can check cluster log in details. Appreciate your help. Thank you in advance. Regards, Girishkumar R Prajapati -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Thu, 9 Sep 2010 10:18:31 -0400 From: Nehemias Jahcob To: linux clustering Subject: Re: [Linux-cluster] need help - Fencing problem Message-ID: Content-Type: text/plain; charset="iso-8859-1" 1. ) You can increase the verbosity level for troubleshooting?? ---- * ----- #ccs_tool update /etc/cluster/cluster.conf Copy-paste /var/log/messages 2.) What version of PSP you have installed?? 3.) If nothing works, I recommend using fence_ipmi Greetings! 2010/9/9 > For what it is worth, our experiences with HP iLO management cards: > > iLO found on G1 servers does not need to be licensed, AFAIK, it does not > have the option to do so anyways. > > iLO2 found on G2 and beyond does not need to be licensed either, if you are > only using it as a fencing device. We licensed all of ours, because it > enabled useful KVM with remote media capabilities that are superior than our > Raritan KVM infrastructure. > > Both management cards should have their firmware updated -- they were both > problematic to us as factory-shipped, but applying their update > packs allowed them to work as advertised. > > Also, can't you add "-v" for verbose output and also something like "-D > /tmp/fence.out" to save debugging info to an output file? It might help > some to see where exactly the failure is occuring. Good luck. > > ------------------------------ > *From:* linux-cluster-bounces at redhat.com [mailto: > linux-cluster-bounces at redhat.com] *On Behalf Of *Girish Prajapati > *Sent:* Wednesday, September 08, 2010 6:06 AM > *To:* Linux-cluster at redhat.com > *Subject:* [Linux-cluster] need help - Fencing problem > > Hello Everybody, > i am having problem of fencing a cluster node let me explain indetail : > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and iLO 2as > fencing device. Am managing cluster through Luci - (Conga). itseems > everything is working fine. I can reboot cluster nodes through Luci and > service get transfer to another node. After rebooting node connect to > cluster automatically without any error. > Problem is i can not do Fence this node through Luci, when i try to fence > any node i get following error : > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 8 14:51:16 node2 fence_node[9106]: Fence of "node1.drctmb.com" was > unsuccessful > > my iLO license is : iLO 2 Advanced Evaluation > Do i need to have license of iLO or there is problem in configuration of > cluster ? > how i can check cluster log in details. > > Appreciate your help. > Thank you in advance. > > Regards, > Girishkumar R Prajapati > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 4 Date: Thu, 9 Sep 2010 11:58:45 -0400 (EDT) From: Ben Turner To: linux clustering Subject: Re: [Linux-cluster] need help - Fencing problem Message-ID: <155361964.174311284047925612.JavaMail.root at zmail07.collab.prod.int.phx2 .redhat.com> Content-Type: text/plain; charset=utf-8 Judging from: "Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device" Chances are you are not using the correct username/password/IP or the ilo is not configured for telnet logins. Try the following: 1. Login to the ilo via telnet from the command line. Be sure to use the username/password/IP you have in cluster.conf. 2. If that is successful try: # fence_ilo -v -a "Ilo IP from cluster.conf" -l "Ilo user from cluster.conf" -p "Ilo passwd from cluster.conf" -o status The -v will display exactly what the fence agent sees and is very useful for debugging failing fences. If the status fails send me the output. 3. If the fence_ilo successful try: # fence_node If all 3 are successful then fencing is setup properly and there may be a problem running it from Luci, if any of the 3 fail post the error back to the list and I'll look at it. -Ben ----- "Girish Prajapati" wrote: > Hello, > i can run following command successfully from another node but still > getting same error message : > > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot > > Sep 9 14:37:00 node2 openais[2904]: [CLM ] Members Joined: > Sep 9 14:37:00 node2 openais[2904]: [SYNC ] This node is within the > primary component and will provide service. > Sep 9 14:37:00 node2 openais[2904]: [TOTEM] entering OPERATIONAL > state. > Sep 9 14:37:00 node2 openais[2904]: [CLM ] got nodejoin message > 192.168.0.28 > Sep 9 14:37:00 node2 openais[2904]: [CPG ] got joinlist message from > node 1 > Sep 9 14:37:00 node2 fenced[2923]: node1.drctmb.com not a cluster > member after 0 sec post_fail_delay > Sep 9 14:37:00 node2 fenced[2923]: fencing node "node1.drctmb.com" > Sep 9 14:37:10 node2 fenced[2923]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 9 14:37:10 node2 fenced[2923]: fence "node1.drctmb.com" failed > Sep 9 14:37:15 node2 fenced[2923]: fencing node "node1.drctmb.com" > Sep 9 14:37:26 node2 fenced[2923]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > > node1 rebooted and get connect to the cluster but now my webby service > not working see below log : > > Broadcast message from root (Thu Sep 9 14:32:41 2010): > The system is going down for system halt NOW! > Sep 9 14:19:22 node1 last message repeated 17 times > Sep 9 14:32:41 node1 shutdown[25506]: shutting down for system halt > Sep 9 14:32:41 node1 pcscd: winscard.c:304:SCardConnect() Reader > E-Gate 0 0 Not Found > Sep 9 14:32:43 node1 modclusterd: shutdown succeeded > Sep 9 14:32:43 node1 rgmanager: [25593]: Shutting down > Cluster Service Manager... > Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down > Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down > Sep 9 14:32:43 node1 clurgmgrd[3457]: Stopping service > service:webby > Sep 9 14:32:44 node1 avahi-daemon[3378]: Withdrawing address record > for 192.168.0.30 on eth0. > Read from remote host node1: Connection reset by peer > . > . > . > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/hda, packet devices > [this device CD/DVD] not SMART capable > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, opened > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, IE (SMART) not > enabled, skip device Try 'smartctl -s on /dev/sda' to turn on SMART > features > Sep 9 14:35:42 node1 smartd[3585]: Monitoring 0 ATA and 0 SCSI devices > Sep 9 14:35:42 node1 smartd[3604]: smartd has fork()ed into background > mode. New PID=3604. > Sep 9 14:35:42 node1 avahi-daemon[3412]: Service "SFTP File Transfer > on node1" (/services/sftp-ssh.service) successfully established. > Sep 9 14:35:45 node1 pcscd: winscard.c:304:SCardConnect() Reader > E-Gate 0 0 Not Found > Sep 9 14:35:45 node1 last message repeated 3 times > Sep 9 14:35:45 node1 kernel: mtrr: type mismatch for d8000000,2000000 > old: uncachable new: write-combining > Sep 9 14:35:46 node1 clurgmgrd: [3491]: Checking Existence Of > File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed > - File Doesn't Exist > > > > It seems that there problem in fencing device configuration. > Please find here my cluster.conf : > > > > > post_join_delay="3"/> > > > > > > > > > > > > > > > > > > > login="root" name="NODE1" passwd="redhat123"/> > login="root" name="NODE2" passwd="redhat123"/> > > > > restricted="1"> > > > > > > fstype="ext3" mountpoint="/var/www/html" name="docroot" > self_fence="0"/> > > server_root="/etc/httpd" shutdown_wait="5"/> > > name="webby" recovery="relocate"> > > > > > > > > ~ > > This is first time am working on Clustering so please help me. > Appreciate your help. > > Thank you. > > > > From: Brem Belguebli > To: linux clustering > Sent: Thu, September 9, 2010 11:30:28 AM > Subject: Re: [Linux-cluster] need help - Fencing problem > > try run this from another node of the cluster > > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot > > > Additionnally, by connecting thru http to the Ilo, you should be able > to > see Ilo logs (in the general tab) and see if it is due to a lack of > licensing > > > On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote: > > Hello... > > > > I have already configure BIOS for iLO.. but am not sure why i don > need > > to shared ?? > > please anybody can help me out for this problem. > > Do i need any extra setup for fencing device ? > > thanks > > > > > > > > > ______________________________________________________________________ > > From: ESGLinux < esggrupos at gmail.com > > > To: linux clustering < linux-cluster at redhat.com > > > Sent: Wed, September 8, 2010 2:57:25 PM > > Subject: Re: [Linux-cluster] need help - Fencing problem > > > > Hello, > > > > > > Have you configured the iLO devices entering in the BIOS? > > > > > > I remenber I have to set up the user/pass in the iLO and marked the > > iLo as not shared > > > > > > > > > > HTH, > > > > > > ESG > > > > 2010/9/8 Girish Prajapati < girishpati at yahoo.com > > > Hello Everybody, > > i am having problem of fencing a cluster node let me explain > > indetail : > > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and > > iLO 2as fencing device. Am managing cluster through Luci - > > (Conga). itseems everything is working fine. I can reboot > > cluster nodes through Luci and service get transfer to another > > node. After rebooting node connect to cluster automatically > > without any error. > > Problem is i can not do Fence this node through Luci, when i > > try to fence any node i get following error : > > > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" > > reports: Unable to connect/login to fencing device > > Sep 8 14:51:16 node2 fence_node[9106]: Fence of > > " node1.drctmb.com " was unsuccessful > > > > my iLO license is : iLO 2 Advanced Evaluation > > Do i need to have license of iLO or there is problem in > > configuration of cluster ? > > how i can check cluster log in details. > > > > Appreciate your help. > > Thank you in advance. > > > > Regards, > > Girishkumar R Prajapati > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster ------------------------------ -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster End of Linux-cluster Digest, Vol 77, Issue 5 ******************************************** Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From girishpati at yahoo.com Fri Sep 10 09:32:35 2010 From: girishpati at yahoo.com (Girish Prajapati) Date: Fri, 10 Sep 2010 02:32:35 -0700 (PDT) Subject: [Linux-cluster] need help - Fencing problem In-Reply-To: <155361964.174311284047925612.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> References: <155361964.174311284047925612.JavaMail.root@zmail07.collab.prod.int.phx2.redhat.com> Message-ID: <505609.49142.qm@web120502.mail.ne1.yahoo.com> Hello Sir, 1st and 2nd option passed successfully. i also try to run command with ilo's name and it run successfully so there is no issue of DNS. i ) when i try to run fence_node command i get the following error: [root at node1 ~]# fence_node node2.drctmb.com agent "fence_ilo" reports: Unable to connect/login to fencing device ii) when i try to fence through Luci i get following error: Sep 10 11:13:10 tmb luci[24270]: Unable to retrieve batch 1700106142 status from node2.drctmb.com:11111: fence_node failed: Please let me know if there is any other why for troubleshoot Thank you. Regards, Girishkumar ________________________________ From: Ben Turner To: linux clustering Sent: Thu, September 9, 2010 9:28:45 PM Subject: Re: [Linux-cluster] need help - Fencing problem Judging from: "Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" reports: Unable to connect/login to fencing device" Chances are you are not using the correct username/password/IP or the ilo is not configured for telnet logins. Try the following: 1. Login to the ilo via telnet from the command line. Be sure to use the username/password/IP you have in cluster.conf. 2. If that is successful try: # fence_ilo -v -a "Ilo IP from cluster.conf" -l "Ilo user from cluster.conf" -p "Ilo passwd from cluster.conf" -o status The -v will display exactly what the fence agent sees and is very useful for debugging failing fences. If the status fails send me the output. 3. If the fence_ilo successful try: # fence_node If all 3 are successful then fencing is setup properly and there may be a problem running it from Luci, if any of the 3 fail post the error back to the list and I'll look at it. -Ben ----- "Girish Prajapati" wrote: > Hello, > i can run following command successfully from another node but still > getting same error message : > > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot > > Sep 9 14:37:00 node2 openais[2904]: [CLM ] Members Joined: > Sep 9 14:37:00 node2 openais[2904]: [SYNC ] This node is within the > primary component and will provide service. > Sep 9 14:37:00 node2 openais[2904]: [TOTEM] entering OPERATIONAL > state. > Sep 9 14:37:00 node2 openais[2904]: [CLM ] got nodejoin message > 192.168.0.28 > Sep 9 14:37:00 node2 openais[2904]: [CPG ] got joinlist message from > node 1 > Sep 9 14:37:00 node2 fenced[2923]: node1.drctmb.com not a cluster > member after 0 sec post_fail_delay > Sep 9 14:37:00 node2 fenced[2923]: fencing node "node1.drctmb.com" > Sep 9 14:37:10 node2 fenced[2923]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > Sep 9 14:37:10 node2 fenced[2923]: fence "node1.drctmb.com" failed > Sep 9 14:37:15 node2 fenced[2923]: fencing node "node1.drctmb.com" > Sep 9 14:37:26 node2 fenced[2923]: agent "fence_ilo" reports: Unable > to connect/login to fencing device > > node1 rebooted and get connect to the cluster but now my webby service > not working see below log : > > Broadcast message from root (Thu Sep 9 14:32:41 2010): > The system is going down for system halt NOW! > Sep 9 14:19:22 node1 last message repeated 17 times > Sep 9 14:32:41 node1 shutdown[25506]: shutting down for system halt > Sep 9 14:32:41 node1 pcscd: winscard.c:304:SCardConnect() Reader > E-Gate 0 0 Not Found > Sep 9 14:32:43 node1 modclusterd: shutdown succeeded > Sep 9 14:32:43 node1 rgmanager: [25593]: Shutting down > Cluster Service Manager... > Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down > Sep 9 14:32:43 node1 clurgmgrd[3457]: Shutting down > Sep 9 14:32:43 node1 clurgmgrd[3457]: Stopping service > service:webby > Sep 9 14:32:44 node1 avahi-daemon[3378]: Withdrawing address record > for 192.168.0.30 on eth0. > Read from remote host node1: Connection reset by peer > . > . > . > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/hda, packet devices > [this device CD/DVD] not SMART capable > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, opened > Sep 9 14:35:42 node1 smartd[3585]: Device: /dev/sda, IE (SMART) not > enabled, skip device Try 'smartctl -s on /dev/sda' to turn on SMART > features > Sep 9 14:35:42 node1 smartd[3585]: Monitoring 0 ATA and 0 SCSI devices > Sep 9 14:35:42 node1 smartd[3604]: smartd has fork()ed into background > mode. New PID=3604. > Sep 9 14:35:42 node1 avahi-daemon[3412]: Service "SFTP File Transfer > on node1" (/services/sftp-ssh.service) successfully established. > Sep 9 14:35:45 node1 pcscd: winscard.c:304:SCardConnect() Reader > E-Gate 0 0 Not Found > Sep 9 14:35:45 node1 last message repeated 3 times > Sep 9 14:35:45 node1 kernel: mtrr: type mismatch for d8000000,2000000 > old: uncachable new: write-combining > Sep 9 14:35:46 node1 clurgmgrd: [3491]: Checking Existence Of > File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed > - File Doesn't Exist > > > > It seems that there problem in fencing device configuration. > Please find here my cluster.conf : > > > > > post_join_delay="3"/> > > > > > > > > > > > > > > > > > > > login="root" name="NODE1" passwd="redhat123"/> > login="root" name="NODE2" passwd="redhat123"/> > > > > restricted="1"> > > > > > > fstype="ext3" mountpoint="/var/www/html" name="docroot" > self_fence="0"/> > > server_root="/etc/httpd" shutdown_wait="5"/> > > name="webby" recovery="relocate"> > > > > > > > > ~ > > This is first time am working on Clustering so please help me. > Appreciate your help. > > Thank you. > > > > From: Brem Belguebli > To: linux clustering > Sent: Thu, September 9, 2010 11:30:28 AM > Subject: Re: [Linux-cluster] need help - Fencing problem > > try run this from another node of the cluster > > fence_ilo -a "Ilo IP" -l "Ilo user" -p "Ilo passwd" -o reboot > > > Additionnally, by connecting thru http to the Ilo, you should be able > to > see Ilo logs (in the general tab) and see if it is due to a lack of > licensing > > > On Wed, 2010-09-08 at 22:29 -0700, Girish Prajapati wrote: > > Hello... > > > > I have already configure BIOS for iLO.. but am not sure why i don > need > > to shared ?? > > please anybody can help me out for this problem. > > Do i need any extra setup for fencing device ? > > thanks > > > > > > > > > ______________________________________________________________________ > > From: ESGLinux < esggrupos at gmail.com > > > To: linux clustering < linux-cluster at redhat.com > > > Sent: Wed, September 8, 2010 2:57:25 PM > > Subject: Re: [Linux-cluster] need help - Fencing problem > > > > Hello, > > > > > > Have you configured the iLO devices entering in the BIOS? > > > > > > I remenber I have to set up the user/pass in the iLO and marked the > > iLo as not shared > > > > > > > > > > HTH, > > > > > > ESG > > > > 2010/9/8 Girish Prajapati < girishpati at yahoo.com > > > Hello Everybody, > > i am having problem of fencing a cluster node let me explain > > indetail : > > I have installed RHEL 5.4 on HP Prolaint DL280 G5 servers and > > iLO 2as fencing device. Am managing cluster through Luci - > > (Conga). itseems everything is working fine. I can reboot > > cluster nodes through Luci and service get transfer to another > > node. After rebooting node connect to cluster automatically > > without any error. > > Problem is i can not do Fence this node through Luci, when i > > try to fence any node i get following error : > > > > Sep 8 14:51:16 node2 fence_node[9106]: agent "fence_ilo" > > reports: Unable to connect/login to fencing device > > Sep 8 14:51:16 node2 fence_node[9106]: Fence of > > " node1.drctmb.com " was unsuccessful > > > > my iLO license is : iLO 2 Advanced Evaluation > > Do i need to have license of iLO or there is problem in > > configuration of cluster ? > > how i can check cluster log in details. > > > > Appreciate your help. > > Thank you in advance. > > > > Regards, > > Girishkumar R Prajapati > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jost.Rakovec at snt.si Sat Sep 11 16:36:44 2010 From: Jost.Rakovec at snt.si (Rakovec Jost) Date: Sat, 11 Sep 2010 18:36:44 +0200 Subject: [Linux-cluster] fence in xen Message-ID: <3754ED14F3EE0C459DEFE2DF184515FF0F101C719C@SIMAIL.snt-is.com> Hi list! I have a question about fence_xvm. Situation is: one physical server with xen --> dom0 with 2 domU. Cluster work fine between domU --reboot, relocate, I'm using redhat 5.5 Problem is with fence from dom0 with "fence_xvm -H oelcl2" , domU is destroyed but when it is booted back domU can't join to the cluster. domU boot very long time --> FENCED_START_TIMEOUT=300 on console I get after the node2 is up: node2: INFO: task clurgmgrd:2127 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. clurgmgrd D 0000000000000010 0 2127 2126 (NOTLB) ffff88006f08dda8 0000000000000286 ffff88007cc0b810 0000000000000000 0000000000000003 ffff880072009860 ffff880072f6b0c0 00000000000455ec ffff880072009a48 ffffffff802649d7 Call Trace: [] _read_lock_irq+0x9/0x19 [] filemap_nopage+0x193/0x360 [] __mutex_lock_slowpath+0x60/0x9b [] .text.lock.mutex+0xf/0x14 [] :dlm:dlm_new_lockspace+0x2c/0x860 [] __up_read+0x19/0x7f [] __kmalloc+0x8f/0x9f [] :dlm:device_write+0x438/0x5e5 [] vfs_write+0xce/0x174 [] sys_write+0x45/0x6e [] tracesys+0xab/0xb6 between booting on node2: Starting clvmd: dlm: Using TCP for communications clvmd startup timed out [FAILED] node2: [root at oelcl2 init.d]# clustat Cluster Status for cluster1 @ Sat Sep 11 18:11:21 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ oelcl1 1 Online oelcl2 2 Online, Local [root at oelcl2 init.d]# on first node: [root at oelcl1 ~]# clustat Cluster Status for cluster1 @ Sat Sep 11 18:12:07 2010 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ oelcl1 1 Online, Local, rgmanager oelcl2 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:webby oelcl1 started [root at oelcl1 ~]# and then I have to destroy both domU on guest and create it back to get node2 work again. I have use how to on https://access.redhat.com/kb/docs/DOC-5937 and http://sources.redhat.com/cluster/wiki/VMClusterCookbook cluster config on dom0 cluster config on domU