[Linux-cluster] ricci is very unstable in one nodes

Mon Sep 27 17:31:31 UTC 2010

Hi
Thanks for your advise,
Currently i got this

luci-0.12.2-12.el5.centos.1
ricci-0.12.2-12.el5.centos.1

is this the same rpm as

luci-0.12.2-12.el5_5.4.i386.rpm  ?
ricci-0.12.2-12.el5_5.4.i386.rpm  ?

Thanks


On 27 September 2010 17:55, Paul M. Dyer <pmdyer at ctgcentral2.com> wrote:

> http://rhn.redhat.com/errata/RHBA-2010-0716.html
>
> It appears that this problem has been fixed in this errata.
>
> I installed the luci and ricci updates and did some lite testing.   So far,
> the timeout 11111 error has not shown up.
>
> Paul
>
> ----- Original Message -----
> From: "fosiul alam" <expertalert at gmail.com>
> To: "linux clustering" <linux-cluster at redhat.com>
> Sent: Monday, September 27, 2010 10:48:27 AM
> Subject: Re: [Linux-cluster] ricci is very unstable in one nodes
>
> Hi
> i am trying to patch ricci . let see how it goes
>
> but clusvcadm is failing as well
>
> [root at http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
> Member http1.xxxx.local trying to enable service:httpd1...Invalid
> operation for resource
>
> here, http1 , where i was trying to run the service from luci
>
> what could be the problem ?
> is there any way to find out if there is any problem with config ??
>
> On 27 September 2010 16:26, Ben Turner < bturner at redhat.com > wrote:
>
>
> RHEL 5.6 hasn't been released yet so your package probably contains the
> problem. I'm not sure how in sync Centos is with RHEL or if they patch
> earlier so I cannot give you a time frame when it will be in Centos or
> if they have already patched it. The problem in that BZ is more of an
> annoyance, you usually just have to retry a time or two and it works. If
> you can't get Luci working properly with your service at all you should
> try enabling the service through the command line with clusvcadm -e. If
> it is not working from the command line either then there is a problem
> with the service config.
>
>
>
>
> -Ben
>
>
>
>
> ----- "fosiul alam" < expertalert at gmail.com > wrote:
>
> > Hi Ben
> > Thanks
> >
> > I named this cluster as mysql-server but i have not installed mysql
> > database in their yet
> >
> > and both luci and ricci on luci server and node1 is running this
> > version
> >
> > luci-0.12.2-12.el5.centos.1
> > ricci-0.12.2-12.el5.centos.1
> >
> >
> > do you think this version has problem as well ??
> >
> > thanks for your help
> >
> >
> >
> >
> > On 24 September 2010 15:33, Ben Turner < bturner at redhat.com > wrote:
> >
> >
> > There is an issue with ricci timeouts that was fixed recently:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=564490
> >
> > I'm not sure but you may be hitting that bug. Symptoms include: luci
> > isn't able to get the status from the node, timeouts when querying
> > ricci, etc. The fix should be released with 5.6
> >
> > On the mysql service there are some options that you need to set. Here
> > are all the options available to that agent:
> >
> > mysql
> > Defines a MySQL database server
> >
> > Attribute Description
> > config_file Define configuration file
> > listen_address Define an IP address for MySQL server. If the address
> > is not given then first IP address from the service is taken.
> > mysqld_options Other command-line options for mysqld
> > name Name
> > ref Reference to existing mysql resource in the resources section.
> > service_name Inherit the service name.
> > shutdown_wait Wait X seconds for correct end of service shutdown
> > startup_wait Wait X seconds for correct end of service startup
> > __enforce_timeouts Consider a timeout for operations as fatal.
> > __failure_expire_time Amount of time before a failure is forgotten.
> > __independent_subtree Treat this and all children as an independent
> > subtree. __max_failures Maximum number of failures before returning a
> > failure to a status check.
> >
> > If I recall correctly you may need to tweak:
> >
> > shutdown_wait Wait X seconds for correct end of service shutdown
> > startup_wait Wait X seconds for correct end of service startup
> >
> > There can be problems relocating the DB if it takes too long to
> > start/shutdown. If you are having problems relocating with luci it may
> > be a good idea to test with:
> >
> > # clusvcadm -r <service name> -m <cluster node>
> >
> > -Ben
> >
> >
> >
> >
> >
> >
> > ----- "fosiul alam" < expertalert at gmail.com > wrote:
> >
> > > Hi
> > > I have 4 nodes cluster,
> > > It was running fine. but today one nodes is giving trouble
> > >
> > > From luci Gui interface, when i try to relocate service into this
> > node
> > > and trying to relocate from this nodes to another nodes
> > >
> > > from luci gui interface, its showing :
> > >
> > > Unable to retrieve batch 1908047789 status from
> > > beaver.domain.local:11111: clusvcadm start failed to start httpd1:
> > > Starting cluster service "httpd1" on node "http1.domain.local" --
> > You
> > > will be redirected in 5 seconds.
> > > also
> > >
> > > The ricci agent for this node is unresponsive. Node-specific
> > > information is not available at this time. :
> > >
> > > but ricci is running on problematic node ,
> > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101
> > >
> > > there is not any firewall running.
> > >
> > > iptables -L
> > > Chain INPUT (policy ACCEPT)
> > > target prot opt source destination
> > >
> > > Chain FORWARD (policy ACCEPT)
> > > target prot opt source destination
> > >
> > > Chain OUTPUT (policy ACCEPT)
> > > target prot opt source destination
> > >
> > > Chain RH-Firewall-1-INPUT (0 references)
> > > target prot opt source destination
> > >
> > > port 11111 is runningg
> > >
> > > netstat -an | grep 11111
> > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN
> > >
> > >
> > > but still ricci is very unstable , and i cant relocate any service
> > on
> > > this node or i cant relocate any service away from this node.
> > >
> > > from problematic node if i type this
> > >
> > > clustat
> > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010
> > > Member Status: Quorate
> > >
> > > Member Name ID Status
> > > ------ ---- ---- ------
> > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this
> > > server publicdns1.xxxx.local 2 Online, rgmanager
> > > http1.xxxx.local 3 Online, Local, rgmanager
> > > mail01.xxxxx.local 4 Online, rgmanager
> > >
> > > Service Name Owner (Last) State
> > > ------- ---- ----- ------ -----
> > > service:httpd1 mail01.xxxx.local started
> > > service:mysql-server http1.xxxx.local started -------------------
> > this
> > > is the problematic node
> > > service:public-dns publicdns1.xxxxxx.local started
> > >
> > > I cant move that service mysql-server from this node or cant
> > relocate
> > > any service on this node ..
> > > I am very confused.
> > >
> > > what shall i do to fix this issue ??
> > >
> > > thanks for your advise.
> > >
> > >
> > >
> > >
> > > -- Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > -- Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> > -- Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> -- Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
> -- Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100927/462f567b/attachment.htm>