[Linux-cluster] ricci is very unstable in one nodes

fosiul alam expertalert at gmail.com
Sat Sep 25 10:00:05 UTC 2010


Hi Ben
Thanks

I named this cluster as mysql-server but i have not installed mysql database
in their yet

and both luci and ricci on luci server and node1 is running this version

luci-0.12.2-12.el5.centos.1
ricci-0.12.2-12.el5.centos.1


do you think this version has problem as well ??

thanks for your help



On 24 September 2010 15:33, Ben Turner <bturner at redhat.com> wrote:

> There is an issue with ricci timeouts that was fixed recently:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=564490
>
> I'm not sure but you may be hitting that bug.  Symptoms include: luci isn't
> able to get the status from the node, timeouts when querying ricci, etc.
>  The fix should be released with 5.6
>
> On the mysql service there are some options that you need to set.  Here are
> all the options available to that agent:
>
> mysql
> Defines a MySQL database server
>
> Attribute       Description
> config_file     Define configuration file
> listen_address  Define an IP address for MySQL server. If the address is
> not given then first IP address from the service is taken.
> mysqld_options  Other command-line options for mysqld
> name    Name
> ref     Reference to existing mysql resource in the resources section.
> service_name    Inherit the service name.
> shutdown_wait   Wait X seconds for correct end of service shutdown
> startup_wait    Wait X seconds for correct end of service startup
> __enforce_timeouts      Consider a timeout for operations as fatal.
> __failure_expire_time   Amount of time before a failure is forgotten.
> __independent_subtree   Treat this and all children as an independent
> subtree.
> __max_failures  Maximum number of failures before returning a failure to a
> status check.
>
> If I recall correctly you may need to tweak:
>
> shutdown_wait   Wait X seconds for correct end of service shutdown
> startup_wait    Wait X seconds for correct end of service startup
>
> There can be problems relocating the DB if it takes too long to
> start/shutdown.  If you are having problems relocating with luci it may be a
> good idea to test with:
>
> # clusvcadm -r <service name> -m <cluster node>
>
> -Ben
>
>
>
> ----- "fosiul alam" <expertalert at gmail.com> wrote:
>
> > Hi
> > I have 4 nodes cluster,
> > It was running fine. but today one nodes is giving trouble
> >
> > From luci Gui interface, when i try to relocate service into this node
> > and trying to relocate from this nodes to another nodes
> >
> > from luci gui interface, its showing :
> >
> > Unable to retrieve batch 1908047789 status from
> > beaver.domain.local:11111: clusvcadm start failed to start httpd1:
> > Starting cluster service "httpd1" on node "http1.domain.local" -- You
> > will be redirected in 5 seconds.
> > also
> >
> > The ricci agent for this node is unresponsive. Node-specific
> > information is not available at this time. :
> >
> > but ricci is running on problematic node ,
> > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101
> >
> > there is not any firewall running.
> >
> > iptables -L
> > Chain INPUT (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain FORWARD (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain OUTPUT (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain RH-Firewall-1-INPUT (0 references)
> > target prot opt source destination
> >
> > port 11111 is runningg
> >
> > netstat -an | grep 11111
> > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN
> >
> >
> > but still ricci is very unstable , and i cant relocate any service on
> > this node or i cant relocate any service away from this node.
> >
> > from problematic node if i type this
> >
> > clustat
> > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010
> > Member Status: Quorate
> >
> > Member Name ID Status
> > ------ ---- ---- ------
> > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this
> > server
> > publicdns1.xxxx.local 2 Online, rgmanager
> > http1.xxxx.local 3 Online, Local, rgmanager
> > mail01.xxxxx.local 4 Online, rgmanager
> >
> > Service Name Owner (Last) State
> > ------- ---- ----- ------ -----
> > service:httpd1 mail01.xxxx.local started
> > service:mysql-server http1.xxxx.local started ------------------- this
> > is the problematic node
> > service:public-dns publicdns1.xxxxxx.local started
> >
> > I cant move that service mysql-server from this node or cant relocate
> > any service on this node ..
> > I am very confused.
> >
> > what shall i do to fix this issue ??
> >
> > thanks for your advise.
> >
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100925/86806fb3/attachment.htm>


More information about the Linux-cluster mailing list