From jayesh.shinde at netcore.co.in Mon Apr 2 15:01:54 2012 From: jayesh.shinde at netcore.co.in (jayesh.shinde) Date: Mon, 02 Apr 2012 20:31:54 +0530 Subject: [Linux-cluster] 2 node cluster network query Message-ID: <4F79BF62.2000505@netcore.co.in> Hi all , I am using the 2 node cluster with drbd. For sending the cluster packets & drbd packet I have connected the 3 cross cable between 2 server. The IP and config is as follows :-- service mailbox1 mailbox2 Interface --------------------------------------------------------------------------- 1) drbd res0 10.10.10.10/16 10.10.10.20/16 eth1 2) drbd res1 10.10.10.30/16 10.10.10.40/16 eth2 3) Cluster Packet 10.10.20.1/16 10.10.20.1/16 eth5 cat /etc/hosts 10.10.20.1 mailbox1 10.10.20.2 mailbox2 ################# part of cluster.conf ######################## ################# part of cluster.conf ######################## Cluster log :-- ---------------- Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] CLM CONFIGURATION CHANGE Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] New Configuration: Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] r(0) ip(10.10.10.20) Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] Members Left: Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] Members Joined: Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] CLM CONFIGURATION CHANGE Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] New Configuration: Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] r(0) ip(10.10.10.10) Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] r(0) ip(10.10.10.20) Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] Members Left: Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] Members Joined: Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] r(0) ip(10.10.10.10) Apr 1 00:45:39 mailbox2 openais[4228]: [SYNC ] This node is within the primary component and will provide service. Apr 1 00:45:39 mailbox2 openais[4228]: [TOTEM] entering OPERATIONAL state. Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] got nodejoin message 10.10.10.10 Apr 1 00:45:39 mailbox2 openais[4228]: [CLM ] got nodejoin message 10.10.10.20 route -n :--- Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0 10.10.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 10.10.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth2 10.10.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth5 69.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth5 0.0.0.0 192.168.1.13 0.0.0.0 UG 0 0 0 bond0 *Question :-- * While starting the cluster , the log are saying that the "*got nodejoin message 10.10.10.10 & 10.10.10.20*" where as the /etc/hosts are define with 10.10.20.1 <---> mailbox1 , 10.10.20.2 <----->mailbox2 1) Why the cluster is not taking the IP 10.10.20.1 & 10.10.20.2 ( i.e cross cable IPs. ) ? 2) Is it because my all cross cable IPs are in 10.10.x.x series , and cluster is finding the nearest IP ( 10.10.10.10 & 10.10.10.20) for communication ? 3) Do I need to take 172.16.x.x or some other IP series ? Regards Jayesh Shinde -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.barber at gmail.com Mon Apr 2 17:00:47 2012 From: jonathan.barber at gmail.com (Jonathan Barber) Date: Mon, 2 Apr 2012 18:00:47 +0100 Subject: [Linux-cluster] Where to find information on HA-LVM In-Reply-To: <036B68E61A28CA49AC2767596576CD597578CF33D0@GVW1113EXC.americas.hpqcorp.net> References: <1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net> <036B68E61A28CA49AC2767596576CD597578CF33D0@GVW1113EXC.americas.hpqcorp.net> Message-ID: On 28 March 2012 06:51, Jankowski, Chris wrote: > Ming, > > Could I ask you to publish the list of most relevant information of HA-LVM > that you?d find on this list, please? ?We?ll all benefit. I found the following to be informative when I first configured HA LVM: https://fedorahosted.org/cluster/wiki/LVMFailover#HALVM Reading the LVM resource scripts was also useful: /usr/share/cluster/lvm.sh /usr/share/cluster/lvm_by_lv.sh /usr/share/cluster/lvm_by_vg.sh Cheers > Chris Jankowski -- Jonathan Barber From ajb2 at mssl.ucl.ac.uk Tue Apr 3 13:14:09 2012 From: ajb2 at mssl.ucl.ac.uk (Alan Brown) Date: Tue, 03 Apr 2012 14:14:09 +0100 Subject: [Linux-cluster] caching of san devices.... Message-ID: <4F7AF7A1.3000701@mssl.ucl.ac.uk> Real Dumb Question[tm] time.... Has anyone tried putting bcache/flashcache in front of shared storage in a GFS2 cluster (on each node, of course) Did it work? Should it work? Is it safe? Are there ways of making it safe? Am I mad for thinking about it? Rationale: Spinning disks are slow to seek, large arrays even more so. As soon as there's a significant load on our GFS2 cluster the random io limitations of the SAN hardware become the single most important factor limiting performance. Only "so much" ram can be installed in any hardware to increase page and dentry caching before physical limits are hit. SSD SAN arrays are hideously expensive and can't always be justified to "the powers that be". Universities are always tightly funded, but there are many other entities facing similar problems. From swhiteho at redhat.com Tue Apr 3 13:28:47 2012 From: swhiteho at redhat.com (Steven Whitehouse) Date: Tue, 03 Apr 2012 14:28:47 +0100 Subject: [Linux-cluster] caching of san devices.... In-Reply-To: <4F7AF7A1.3000701@mssl.ucl.ac.uk> References: <4F7AF7A1.3000701@mssl.ucl.ac.uk> Message-ID: <1333459727.2702.21.camel@menhir> Hi, On Tue, 2012-04-03 at 14:14 +0100, Alan Brown wrote: > Real Dumb Question[tm] time.... > > Has anyone tried putting bcache/flashcache in front of shared storage in > a GFS2 cluster (on each node, of course) > > Did it work? > > Should it work? > > Is it safe? > > Are there ways of making it safe? > > Am I mad for thinking about it? > > Rationale: > > Spinning disks are slow to seek, large arrays even more so. > Large arrays should be much faster, provided the data is in cache. > As soon as there's a significant load on our GFS2 cluster the random io > limitations of the SAN hardware become the single most important factor > limiting performance. > > Only "so much" ram can be installed in any hardware to increase page and > dentry caching before physical limits are hit. > > SSD SAN arrays are hideously expensive and can't always be justified to > "the powers that be". > > Universities are always tightly funded, but there are many other > entities facing similar problems. > I can't see any mention that bcache supports clusters at all. I don't think that it is likely to work. Certainly the web page I found suggests that it doesn't support barriers (silently dropped) but I'm not sure whether that refers to "real" barriers or the flush based system that we use now. I'd be very surprised if that would work. What do you mean by flashcache? This perhaps: http://www.netapp.com/uk/products/storage-systems/flash-cache/ It looks like a hardware implementation of the same thing, and I can't see anything to suggest that it is cluster aware on a first reading of the docs, Steve. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From ajb2 at mssl.ucl.ac.uk Tue Apr 3 14:55:15 2012 From: ajb2 at mssl.ucl.ac.uk (Alan Brown) Date: Tue, 03 Apr 2012 15:55:15 +0100 Subject: [Linux-cluster] caching of san devices.... In-Reply-To: <1333459727.2702.21.camel@menhir> References: <4F7AF7A1.3000701@mssl.ucl.ac.uk> <1333459727.2702.21.camel@menhir> Message-ID: <4F7B0F53.30402@mssl.ucl.ac.uk> On 03/04/12 14:28, Steven Whitehouse wrote: >> Spinning disks are slow to seek, large arrays even more so. >> > Large arrays should be much faster, provided the data is in cache. Or not, when there's a lot of random IO involved and it's not in cache. I'm talking about arrays such as nexsan atabeasts (a drawer full of sata drives) > I can't see any mention that bcache supports clusters at all. I don't > think that it is likely to work. Certainly the web page I found suggests > that it doesn't support barriers (silently dropped) It doesn't and there are specific warnings to disable barriers on ext4 and friends when using it. Bcache is writethrough by default. Writeback can be enabled but is beta quality and I think it would conflict badly with clustered filesystems. > What do you mean by flashcache? This perhaps: Facebook's caching implementation which is almost like bcache but much simpler in its implementation. > http://www.netapp.com/uk/products/storage-systems/flash-cache/ > > It looks like a hardware implementation of the same thing, and I can't > see anything to suggest that it is cluster aware on a first reading of > the docs, There are a few SAN-level accelerators but the cost of those things starts around $20,000 and climbs from there. From florian at hastexo.com Tue Apr 3 18:15:15 2012 From: florian at hastexo.com (Florian Haas) Date: Tue, 3 Apr 2012 20:15:15 +0200 Subject: [Linux-cluster] caching of san devices.... In-Reply-To: <4F7AF7A1.3000701@mssl.ucl.ac.uk> References: <4F7AF7A1.3000701@mssl.ucl.ac.uk> Message-ID: On Tue, Apr 3, 2012 at 3:14 PM, Alan Brown wrote: > > Real Dumb Question[tm] time.... > > Has anyone tried putting bcache/flashcache in front of shared storage in a > GFS2 cluster (on each node, of course) I can't talk about bcache but have worked with flashcache a bit, and there's a presentation of mine on how to use it in clustering at http://www.hastexo.com/resources/presentations/storage-replication-high-performance-high-availability-environments (which is all about Pacemaker, though). But for GFS2 specifically: > Did it work? It won't. > Should it work? No. > Is it safe? No. There's no cluster awareness the way you envision it, and there's no way to do multi-master replication of the flashcache cache device, which you would need. > Are there ways of making it safe? Implement the above, and it might be. (You don't want to.) > Am I mad for thinking about it? Ahum, well, now that you mention it... ;) > Rationale: > > Spinning disks are slow to seek, large arrays even more so. > > As soon as there's a significant load on our GFS2 cluster the random io > limitations of the SAN hardware become the single most important factor > limiting performance. > > Only "so much" ram can be installed in any hardware to increase page and > dentry caching before physical limits are hit. > > SSD SAN arrays are hideously expensive and can't always be justified to "the > powers that be". > > Universities are always tightly funded, but there are many other entities > facing similar problems. I think you've got two possibilities: 1. Stick SSD based caching into your SAN. Google for CacheCade or MaxCache for some vendor implementations. 2. Consider ditching your GFS2 for SSD based GlusterFS replication. I realize option 2 may get me booed off the list, and I know nothing about your requirements other than what you posted here, but if you just want something that is writable from all nodes and frees you from your SAN, then that might be a possibility. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now From jstoner at opsource.net Tue Apr 3 19:58:05 2012 From: jstoner at opsource.net (Jeff Stoner) Date: Tue, 3 Apr 2012 15:58:05 -0400 Subject: [Linux-cluster] Documentation about the fence agent interface to RHCS? In-Reply-To: <4F75F7A9.9030608@alteeve.ca> References: <4F75F7A9.9030608@alteeve.ca> Message-ID: Any luci devs on the list? I'm looking for info on integrating fencing agents into luci. Once the Powers That Be allow me to release our fencing agent, I'd like to take a stab at making it easier use via luci. On Fri, Mar 30, 2012 at 2:12 PM, Digimer wrote: > On 03/30/2012 10:00 AM, Jonathan Barber wrote: > > I'm writing a fencing agent and would like to know is there a document > > describing the interface that fencing agents should support. i.e. how > > are arguments passed to the fence agent, what exit codes represent, is > > anything done with the agents standard out/error. > > > > I've looked at the agents that ship with RHCS and have some idea > > what's going on, but it'd be nice to have to documentation to confirm > > my suspicions. > > > > Cheers > > Check out this: https://fedorahosted.org/cluster/wiki/FenceAgentAPI > > If you need help/clarity, let me know. > > -- > Digimer > Papers and Projects: https://alteeve.com > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- *Jeff Stoner ******| **Cloud Evangelist* O +1-703-668-1920 | M +1-703-475-7720 | E jstoner at opsource.net Op*Source, Inc. **|* www.opsource.net | Twitter @opsource.net Red Hat Certified Engineer (cert number 805009770342158) -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Tue Apr 3 23:00:27 2012 From: lists at alteeve.ca (Digimer) Date: Tue, 03 Apr 2012 16:00:27 -0700 Subject: [Linux-cluster] Documentation about the fence agent interface to RHCS? In-Reply-To: References: <4F75F7A9.9030608@alteeve.ca> Message-ID: <4F7B810B.7070608@alteeve.ca> On 04/03/2012 12:58 PM, Jeff Stoner wrote: > Any luci devs on the list? I'm looking for info on integrating fencing > agents into luci. Once the Powers That Be allow me to release our > fencing agent, I'd like to take a stab at making it easier use via luci. Baring a correction from someone more in the know... I believe that, so long as your fence agent outputs it's metadata properly, then luci should use it. The trick is getting it added to the fence-agents RPM, which I can help with. -- Digimer Papers and Projects: https://alteeve.com From fdinitto at redhat.com Wed Apr 4 04:46:16 2012 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Wed, 04 Apr 2012 06:46:16 +0200 Subject: [Linux-cluster] Documentation about the fence agent interface to RHCS? In-Reply-To: References: <4F75F7A9.9030608@alteeve.ca> Message-ID: <4F7BD218.60109@redhat.com> On 04/03/2012 09:58 PM, Jeff Stoner wrote: > Any luci devs on the list? I'm looking for info on integrating fencing > agents into luci. Once the Powers That Be allow me to release our > fencing agent, I'd like to take a stab at making it easier use via luci. Let's get the agent upstream first and in good shape (license, metadata output, man pages and all of that), then adding it to luci is "simple". Fabio > > On Fri, Mar 30, 2012 at 2:12 PM, Digimer > wrote: > > On 03/30/2012 10:00 AM, Jonathan Barber wrote: > > I'm writing a fencing agent and would like to know is there a document > > describing the interface that fencing agents should support. i.e. how > > are arguments passed to the fence agent, what exit codes represent, is > > anything done with the agents standard out/error. > > > > I've looked at the agents that ship with RHCS and have some idea > > what's going on, but it'd be nice to have to documentation to confirm > > my suspicions. > > > > Cheers > > Check out this: https://fedorahosted.org/cluster/wiki/FenceAgentAPI > > If you need help/clarity, let me know. > > -- > Digimer > Papers and Projects: https://alteeve.com > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > -- > *Jeff Stoner ***********| ****Cloud Evangelist** > O +1-703-668-1920 | M +1-703-475-7720 | E jstoner at opsource.net > > Op**Source, Inc. ****|** www.opsource.net > | Twitter @opsource.net > > > Red Hat Certified Engineer (cert number 805009770342158) > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Wed Apr 4 04:58:04 2012 From: lists at alteeve.ca (Digimer) Date: Tue, 03 Apr 2012 21:58:04 -0700 Subject: [Linux-cluster] Documentation about the fence agent interface to RHCS? In-Reply-To: <4F7BD218.60109@redhat.com> References: <4F75F7A9.9030608@alteeve.ca> <4F7BD218.60109@redhat.com> Message-ID: <4F7BD4DC.2090508@alteeve.ca> Exactly the person I was hoping would chime in. :) On 04/03/2012 09:46 PM, Fabio M. Di Nitto wrote: > On 04/03/2012 09:58 PM, Jeff Stoner wrote: >> Any luci devs on the list? I'm looking for info on integrating fencing >> agents into luci. Once the Powers That Be allow me to release our >> fencing agent, I'd like to take a stab at making it easier use via luci. > > Let's get the agent upstream first and in good shape (license, metadata > output, man pages and all of that), then adding it to luci is "simple". > > Fabio > >> >> On Fri, Mar 30, 2012 at 2:12 PM, Digimer > > wrote: >> >> On 03/30/2012 10:00 AM, Jonathan Barber wrote: >> > I'm writing a fencing agent and would like to know is there a document >> > describing the interface that fencing agents should support. i.e. how >> > are arguments passed to the fence agent, what exit codes represent, is >> > anything done with the agents standard out/error. >> > >> > I've looked at the agents that ship with RHCS and have some idea >> > what's going on, but it'd be nice to have to documentation to confirm >> > my suspicions. >> > >> > Cheers >> >> Check out this: https://fedorahosted.org/cluster/wiki/FenceAgentAPI >> >> If you need help/clarity, let me know. >> >> -- >> Digimer >> Papers and Projects: https://alteeve.com >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> >> >> -- >> *Jeff Stoner ***********| ****Cloud Evangelist** >> O +1-703-668-1920 | M +1-703-475-7720 | E jstoner at opsource.net >> >> Op**Source, Inc. ****|** www.opsource.net >> | Twitter @opsource.net >> >> >> Red Hat Certified Engineer (cert number 805009770342158) >> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Digimer Papers and Projects: https://alteeve.com From parvez.h.shaikh at gmail.com Wed Apr 4 05:41:37 2012 From: parvez.h.shaikh at gmail.com (Parvez Shaikh) Date: Wed, 4 Apr 2012 11:11:37 +0530 Subject: [Linux-cluster] Multicast address by CMAN Message-ID: Hi all, As per my understanding, CMAN uses cluster name to internally generate multi-cast address. In my cluster.conf Having a cluster with same name in a given network leads to issue and is undesirable. I want to know is there anyway to find if multicast address is already in use by some other cluster, so as to avoid using name that generate same multicast IP or for that matter configuring same multicast IP in cluster.conf Thanks, Parvez -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdinitto at redhat.com Wed Apr 4 07:02:23 2012 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Wed, 04 Apr 2012 09:02:23 +0200 Subject: [Linux-cluster] Multicast address by CMAN In-Reply-To: References: Message-ID: <4F7BF1FF.4040203@redhat.com> On 4/4/2012 7:41 AM, Parvez Shaikh wrote: > Hi all, > > As per my understanding, CMAN uses cluster name to internally generate > multi-cast address. In my cluster.conf > > Having a cluster with same name in a given network leads to issue and is > undesirable. > > I want to know is there anyway to find if multicast address is already > in use by some other cluster, so as to avoid using name that generate > same multicast IP or for that matter configuring same multicast IP in > cluster.conf cman_tool status will show the multicast address in use by a given cluster. Fabio From emi2fast at gmail.com Wed Apr 4 07:11:58 2012 From: emi2fast at gmail.com (emmanuel segura) Date: Wed, 4 Apr 2012 09:11:58 +0200 Subject: [Linux-cluster] Multicast address by CMAN In-Reply-To: References: Message-ID: One simple way it's netstat -gn in a diferent cluster Il giorno 04 aprile 2012 07:41, Parvez Shaikh ha scritto: > Hi all, > > As per my understanding, CMAN uses cluster name to internally generate > multi-cast address. In my cluster.conf > > Having a cluster with same name in a given network leads to issue and is > undesirable. > > I want to know is there anyway to find if multicast address is already in > use by some other cluster, so as to avoid using name that generate same > multicast IP or for that matter configuring same multicast IP in > cluster.conf > > Thanks, > Parvez > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From schlegel at riege.com Wed Apr 4 12:25:53 2012 From: schlegel at riege.com (Gunther Schlegel) Date: Wed, 04 Apr 2012 14:25:53 +0200 Subject: [Linux-cluster] qdiskd in heuristics mode only? Message-ID: <4F7C3DD1.5060208@riege.com> Hi, is there any way to prevent fencing if the qdisk quorum partition can't be accessed? (yes, that does make sense!) Scenario is like this: - 2 node cluster, RHEL6.2, internal data storage (mysql multi-master-replication, no GFS involved) - qdiskd is in place for two reasons: 1) I need to run some heuristics 2) to gather quorum if only one node starts up. - quorum partition is on iSCSI SAN - SAN storage is not required for the cluster services to operate at all (left aside it should work at node startup. But if the iscsi link goes down later on there is no need to actually fence a node as long as the network cluster communication between these two nodes is fine). SAN firmware upgrades interrupt the iSCSI storage for about 40 seconds (multipathing et al is properly set up and working fine, SAN controller failover just takes that long). To mitigate that I need to set quite big totem consensus timeouts. Do not like that, but ok. But the qdiskd keeps on fencing the nodes as soon as quorum partition access is restored. Is there any hidden setting to prevent that? best regards, Gunther -- Gunther Schlegel Head of IT Infrastructure ............................................................. Riege Software International GmbH Phone: +49 2159 91480 Mollsfeld 10 Fax: +49 2159 914811 40670 Meerbusch Web: www.riege.com Germany E-Mail: schlegel at riege.com -- -- Commercial Register: Managing Directors: Amtsgericht Neuss HRB-NR 4207 Christian Riege VAT Reg No.: DE120585842 Gabriele Riege Johannes Riege Tobias Riege ............................................................. YOU CARE FOR FREIGHT, WE CARE FOR YOU -------------- next part -------------- A non-text attachment was scrubbed... Name: schlegel.vcf Type: text/x-vcard Size: 346 bytes Desc: not available URL: From wsfax.alu.es at gmail.com Thu Apr 5 22:19:15 2012 From: wsfax.alu.es at gmail.com (wsfax alu.es) Date: Fri, 6 Apr 2012 00:19:15 +0200 Subject: [Linux-cluster] Cluster failure, dlm overload Message-ID: Hi, First of all, thanks for your time. A five node cluster that is sharing several GFS filesystem is having total blocks of filesystem activity. Around one block each week. These blocks appeared several weeks ago, after more than three years in service. Cluster is restored after restart of all cluster nodes ;-) When these blocks appears, we can see dlm send and receive process with a high level of CPU consumption, network traffic is a also ten times the normal one. A capture (wireshark) of network traffic in DLM port shows thousand of messages per second. In particular, all "request message" are replied with a "request reply" where errno=EBADR, Lookup messages seems ok. The cluster is with a software version a few outdated, the one of RedHat 2.6.18, but not possible to upgrade easily. Any suggestion is welcome. Kind regards, ALU -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbourque at accuweather.com Fri Apr 6 15:17:11 2012 From: dbourque at accuweather.com (Daniel Bourque) Date: Fri, 6 Apr 2012 15:17:11 +0000 Subject: [Linux-cluster] checking syntax errors default_event_script.sl Message-ID: <9B922D75F5EA6A43AB899119482AB5E821072A88@exch-db02.accu.accuwx.com> Hi, background: I'm working on adding load balancing via RIND. I discovered that not every events are passed to event scripts defined in cluster.conf, therefore I have to modify /usr/share/cluster/default_event_script.sl . In order to not have to restart rgmanager all the time, I changed /usr/share/cluster/default_event_script.sl so that it contains only this: evalfile("//default_event_script.sl"); This allows me to change the main RIND script live. the problem: I would like to work be able to work a copy, and do syntax error checks via "slsh -t" before overwriting the live one. I can't simply do that because slsh doesn't find the definition for all the functions and variables used in default_event_script.sl. where are the libraries I need to include in my SLSH_PATH ? Thanks ! -- Daniel Bourque Sr. Systems Engineer AccuWeather Office (316) 266-8013 Office (316) 266-8000 ext. 8013 Mobile (316) 640-1024 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ming-ming.chen at hp.com Fri Apr 6 17:32:32 2012 From: ming-ming.chen at hp.com (Chen, Ming Ming) Date: Fri, 6 Apr 2012 17:32:32 +0000 Subject: [Linux-cluster] fail to enable the vm in a cluster with vm service In-Reply-To: <1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net> References: <1D241511770E2F4BA89AFD224EDD527117B82078@G9W0737.americas.hpqcorp.net> Message-ID: <1D241511770E2F4BA89AFD224EDD527117B90213@G9W0737.americas.hpqcorp.net> Hi, I have a two node cluster with a vm service. I can do the vm migrate using clusvcadm (clusvcadm -M vm:vm297 -m node2). However if I use "clusvcadm -d vm:vm297" to disable/stop the vm, and then try to use "clusvcadm -e vm:vm297" to start it , then it will fail. However, I can manually create the vm297 sing virsh (virsh create /abc/config/vm297.xml). Any help and comment will be appreciated. Thanks in advance. Ming The following message is from the rgmanager.log file: **** Stop the vm vm297 successfully ***** Apr 06 09:35:57 rgmanager 1 events processed Apr 06 09:36:05 rgmanager Stopping service vm:vm297 Apr 06 09:36:05 rgmanager [vm] Using /abc/config//vm297.xml Apr 06 09:36:26 rgmanager 1 events processed Apr 06 09:36:39 rgmanager [script] Executing /etc/init.d/libvirtd status ****** start the vm 297 failed ******** Apr 06 09:36:49 rgmanager No other nodes have seen vm:vm297 Apr 06 09:36:49 rgmanager Starting disabled service vm:vm297 Apr 06 09:36:49 rgmanager [vm] Using /abc/config//vm297.xml Apr 06 09:36:49 rgmanager [vm] /abc/config//vm297.xml is XML; using virsh Apr 06 09:36:49 rgmanager [vm] virsh create /abc/config//vm297.xml Apr 06 09:36:49 rgmanager start on vm "vm297" returned 1 (generic error) Apr 06 09:36:49 rgmanager #68: Failed to start vm:vm297; return value: 1 Apr 06 09:36:49 rgmanager Stopping failed service vm:vm297 Apr 06 09:36:49 rgmanager Stopping service vm:vm297 Apr 06 09:36:49 rgmanager [vm] Virtual machine vm297 is Apr 06 09:36:49 rgmanager Service vm:vm297 is recovering Apr 06 09:36:49 rgmanager #71: Relocating failed service vm:vm297 Apr 06 09:36:49 rgmanager Service vm:vm297 is stopped Apr 06 09:36:55 rgmanager 2 events processed My cluster.conf file is :