From anasnajj at gmail.com Wed Jun 2 02:42:20 2010 From: anasnajj at gmail.com (Anas Alnajjar) Date: Wed, 2 Jun 2010 05:42:20 +0300 Subject: [Linux-cluster] check status time out Message-ID: <000301cb01fd$3b2bc8f0$b1835ad0$@com> Dear all Hi I wish you have enjoyable life? I have Redhat cluster on Centos 5.4 and I make Script resource to handle my service " /etc/init.d/xxxx " but I need to modify check status time out because my service take long time to return back its status so how i can do this BR -------------- next part -------------- An HTML attachment was scrubbed... URL: From glisha at gmail.com Wed Jun 2 14:36:43 2010 From: glisha at gmail.com (Georgi Stanojevski) Date: Wed, 2 Jun 2010 16:36:43 +0200 Subject: [Linux-cluster] check status time out In-Reply-To: <000301cb01fd$3b2bc8f0$b1835ad0$@com> References: <000301cb01fd$3b2bc8f0$b1835ad0$@com> Message-ID: On Wed, Jun 2, 2010 at 4:42 AM, Anas Alnajjar wrote: > I have Redhat cluster on Centos 5.4? and I make Script resource to handle my > service ? /etc/init.d/xxxx ? but I need to modify check? status time out > because my service take long time to return back its status so how i can do > this According to /usr/share/cluster/script.sh you can't set up timeout for status check. So I guess it waits indefinitely for the status script to return? Are you sure you need to increase the timeout? Does rgmanager kill your resource after a long time running or because it returns <>0? I have just the opposite problem. If my status doesn't return in ex. 60s I need to restart the service, and according to the comments in script.sh I can't do that? -- Glisha From dhoffutt at gmail.com Wed Jun 2 15:50:09 2010 From: dhoffutt at gmail.com (Dustin Henry Offutt) Date: Wed, 2 Jun 2010 10:50:09 -0500 Subject: [Linux-cluster] check status time out In-Reply-To: References: <000301cb01fd$3b2bc8f0$b1835ad0$@com> Message-ID: Life is absolutely enjoyable! Hope yours is as well! What one might consider in such a situation is instead calling a custom wrapper script... Have the custom script do something like: a "thought script": myTimeOut = 60 seconds? 120 seconds? start { /etc/init.d/myService start date +SOMEFORMAT > /var/lock/subsys/customScriptStartTimeStamp } stop { /etc/init.d/myService stop } status { $serviceStartedAt = $(cat /var/lock/subsys/customScriptStartTimeStamp) if ($serviceStartedAt is longer ago than a timestamp taken now plus $myTimeOut){ return $(service myService status) } else { return 0 } So the wrapper won't start querying the real service for a status until after the timeout specified in the myTimeOut variable.... Just an idea... On Wed, Jun 2, 2010 at 9:36 AM, Georgi Stanojevski wrote: > On Wed, Jun 2, 2010 at 4:42 AM, Anas Alnajjar wrote: > > > I have Redhat cluster on Centos 5.4 and I make Script resource to handle > my > > service ? /etc/init.d/xxxx ? but I need to modify check status time out > > because my service take long time to return back its status so how i can > do > > this > > According to /usr/share/cluster/script.sh you can't set up timeout for > status check. > > > > So I guess it waits indefinitely for the status script to return? > > Are you sure you need to increase the timeout? Does rgmanager kill > your resource after a long time running or because it returns <>0? > > I have just the opposite problem. If my status doesn't return in ex. > 60s I need to restart the service, and according to the comments in > script.sh I can't do that? > > -- > Glisha > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From kitgerrits at gmail.com Wed Jun 2 21:16:36 2010 From: kitgerrits at gmail.com (Kit Gerrits) Date: Wed, 2 Jun 2010 23:16:36 +0200 Subject: [Linux-cluster] check status time out In-Reply-To: Message-ID: <4c06ca34.1067f10a.4fa2.6595@mx.google.com> You can also try playing with script.sh. From: http://sources.redhat.com/cluster/wiki/FAQ/RGManager#rgm_svcstart How can I change the interval at which rgmanager checks a given service? The interval is in the script for each service, in /usr/share/cluster/ It's easier to just change the script.sh file to use whatever value you want (<5 is not supported, though). Checking is per-resource-type, not per-service, because it takes more system time to check one resource type vs. another resource type. That is, a check on a "script" might happen only every 30 seconds, while a check on an "ip" might happen every 10 seconds. The status checks are not supposed to consume system resources. Historically, people have done one of two things which generate support calls: * Does not set a status check interval at all (why is my service not being checked?), or * sets the status check interval to something way too low, like 10 seconds for an Oracle service (why is the cluster acting strange/running slowly?). If the status check interval is lower than the actual amount of time it takes to check the status of a service, you end up with endless status-checking, which is a pure waste of resources. ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Dustin Henry Offutt Sent: woensdag 2 juni 2010 17:50 To: linux clustering Subject: Re: [Linux-cluster] check status time out Life is absolutely enjoyable! Hope yours is as well! What one might consider in such a situation is instead calling a custom wrapper script... Have the custom script do something like: a "thought script": myTimeOut = 60 seconds? 120 seconds? start { /etc/init.d/myService start date +SOMEFORMAT > /var/lock/subsys/customScriptStartTimeStamp } stop { /etc/init.d/myService stop } status { $serviceStartedAt = $(cat /var/lock/subsys/customScriptStartTimeStamp) if ($serviceStartedAt is longer ago than a timestamp taken now plus $myTimeOut){ return $(service myService status) } else { return 0 } So the wrapper won't start querying the real service for a status until after the timeout specified in the myTimeOut variable.... Just an idea... On Wed, Jun 2, 2010 at 9:36 AM, Georgi Stanojevski wrote: On Wed, Jun 2, 2010 at 4:42 AM, Anas Alnajjar wrote: > I have Redhat cluster on Centos 5.4 and I make Script resource to handle my > service " /etc/init.d/xxxx " but I need to modify check status time out > because my service take long time to return back its status so how i can do > this According to /usr/share/cluster/script.sh you can't set up timeout for status check. So I guess it waits indefinitely for the status script to return? Are you sure you need to increase the timeout? Does rgmanager kill your resource after a long time running or because it returns <>0? I have just the opposite problem. If my status doesn't return in ex. 60s I need to restart the service, and according to the comments in script.sh I can't do that? -- Glisha -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.819 / Virus Database: 271.1.1/2911 - Release Date: 06/01/10 20:25:00 From glisha at gmail.com Thu Jun 3 20:46:39 2010 From: glisha at gmail.com (Georgi Stanojevski) Date: Thu, 3 Jun 2010 22:46:39 +0200 Subject: [Linux-cluster] only one service fails-over out of two depended services. Message-ID: Hi, I have configured two services in my two-node cluster (RHEL 5.4). service1 - with ip, ha-lvm and fs resources. service2 - with a script ?resource which depends on service1. When i manually relocate the services everything works as expected. But, when i fail one node (halt -f) only service1 gets relocated to the other node. service2 "stays" on the failed node in started state. The logs say that only service1 will be taken over from the failed node. No mention that service2 should be failed to the working node. Jun ?3 22:17:35 node1 clurgmgrd[22963]: Waiting for node #2 to be fenced Jun ?3 22:17:43 node1 clurgmgrd[22963]: Node #2 fenced; continuing Jun ?3 22:17:43 node1 clurgmgrd[22963]: Evaluating RG service:service1, state started, owner node2 Jun ?3 22:17:43 node1 clurgmgrd[22963]: Evaluating RG service:service2, state started, owner node2 Jun ?3 22:17:43 node1 clurgmgrd[22963]: Taking over service service:service1 from down member node2 ... Jun ?3 22:17:45 yeti clurgmgrd[22963]: Service service:service1 started Does anyone have an idea if I am mis-configuring something? Here is clustat when one node is failed: === Cluster Status for cluster1 @ Thu Jun ?3 22:34:52 2010 Member Status: Quorate Member Name ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ID ? Status ------ ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ---- ------ node1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1 Online, Local, rgmanager node2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2 Offline Service Name ? ? ? ? ? ? ? ? ? ?? ? ? ? Owner (Last) ? ? ? ? ? State ------- ---- ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ----- ------ ? ? ? ? ? ? ? ? ? ? ? ? ? ----- service:service1 ? ? ? ? ? ? ? ? ? ? ? ? node1 started service:service2 ? ? ? ? ? ? ? ? ? ? ? ?node2 started === Here is the snippet of cluster.conf regarding the services: ===