[Linux-cluster] ricci is very unstable in one nodes

fosiul alam expertalert at gmail.com
Tue Sep 28 10:32:54 UTC 2010


HI ya

i found this interesting .. but dont know if its normal or not

i typed this command in 3 cluster nodes

tcpdump -i eth0 ip multicast


and for some reason.. i am seeing same output in 3 server which is

11:26:13.700399 IP http1.xxxxx.local.5149 > 239.192.2.185.netsupport: UDP,
length 118


example.. Same output in every 3 server..

is this normal output ?? ( here http1 is having the trouble to locate or
relocate services in the cluster)

so basically, what ever i am seeing in http1 server i am seeing the same out
put on rest ..

here 239.192.2.185 is  the multicast address of clsuter

Thanks
fosiul

On 27 September 2010 18:37, fosiul alam <expertalert at gmail.com> wrote:

> Hi, Addition to my previous email have a look to this one
>
> from http1 ( where i am trying to relocate a service)
>
>
> [root at http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
> Member http1.xxxx.local trying to enable service:httpd1...Success
> Warning: service:httpd1 is now running on mail01.xxxx.local
>
> so, its saying its Success..
> but it actually no..
>
> Thanks again
>
>
>
>
> On 27 September 2010 18:31, fosiul alam <expertalert at gmail.com> wrote:
>
>> Hi
>> Thanks for your advise,
>> Currently i got this
>>
>>
>> luci-0.12.2-12.el5.centos.1
>> ricci-0.12.2-12.el5.centos.1
>>
>> is this the same rpm as
>>
>> luci-0.12.2-12.el5_5.4.i386.rpm  ?
>> ricci-0.12.2-12.el5_5.4.i386.rpm  ?
>>
>> Thanks
>>
>>
>>
>> On 27 September 2010 17:55, Paul M. Dyer <pmdyer at ctgcentral2.com> wrote:
>>
>>> http://rhn.redhat.com/errata/RHBA-2010-0716.html
>>>
>>> It appears that this problem has been fixed in this errata.
>>>
>>> I installed the luci and ricci updates and did some lite testing.   So
>>> far, the timeout 11111 error has not shown up.
>>>
>>> Paul
>>>
>>> ----- Original Message -----
>>> From: "fosiul alam" <expertalert at gmail.com>
>>> To: "linux clustering" <linux-cluster at redhat.com>
>>> Sent: Monday, September 27, 2010 10:48:27 AM
>>> Subject: Re: [Linux-cluster] ricci is very unstable in one nodes
>>>
>>> Hi
>>> i am trying to patch ricci . let see how it goes
>>>
>>> but clusvcadm is failing as well
>>>
>>> [root at http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
>>> Member http1.xxxx.local trying to enable service:httpd1...Invalid
>>> operation for resource
>>>
>>> here, http1 , where i was trying to run the service from luci
>>>
>>> what could be the problem ?
>>> is there any way to find out if there is any problem with config ??
>>>
>>> On 27 September 2010 16:26, Ben Turner < bturner at redhat.com > wrote:
>>>
>>>
>>> RHEL 5.6 hasn't been released yet so your package probably contains the
>>> problem. I'm not sure how in sync Centos is with RHEL or if they patch
>>> earlier so I cannot give you a time frame when it will be in Centos or
>>> if they have already patched it. The problem in that BZ is more of an
>>> annoyance, you usually just have to retry a time or two and it works. If
>>> you can't get Luci working properly with your service at all you should
>>> try enabling the service through the command line with clusvcadm -e. If
>>> it is not working from the command line either then there is a problem
>>> with the service config.
>>>
>>>
>>>
>>>
>>> -Ben
>>>
>>>
>>>
>>>
>>> ----- "fosiul alam" < expertalert at gmail.com > wrote:
>>>
>>> > Hi Ben
>>> > Thanks
>>> >
>>> > I named this cluster as mysql-server but i have not installed mysql
>>> > database in their yet
>>> >
>>> > and both luci and ricci on luci server and node1 is running this
>>> > version
>>> >
>>> > luci-0.12.2-12.el5.centos.1
>>> > ricci-0.12.2-12.el5.centos.1
>>> >
>>> >
>>> > do you think this version has problem as well ??
>>> >
>>> > thanks for your help
>>> >
>>> >
>>> >
>>> >
>>> > On 24 September 2010 15:33, Ben Turner < bturner at redhat.com > wrote:
>>> >
>>> >
>>> > There is an issue with ricci timeouts that was fixed recently:
>>> >
>>> > https://bugzilla.redhat.com/show_bug.cgi?id=564490
>>> >
>>> > I'm not sure but you may be hitting that bug. Symptoms include: luci
>>> > isn't able to get the status from the node, timeouts when querying
>>> > ricci, etc. The fix should be released with 5.6
>>> >
>>> > On the mysql service there are some options that you need to set. Here
>>> > are all the options available to that agent:
>>> >
>>> > mysql
>>> > Defines a MySQL database server
>>> >
>>> > Attribute Description
>>> > config_file Define configuration file
>>> > listen_address Define an IP address for MySQL server. If the address
>>> > is not given then first IP address from the service is taken.
>>> > mysqld_options Other command-line options for mysqld
>>> > name Name
>>> > ref Reference to existing mysql resource in the resources section.
>>> > service_name Inherit the service name.
>>> > shutdown_wait Wait X seconds for correct end of service shutdown
>>> > startup_wait Wait X seconds for correct end of service startup
>>> > __enforce_timeouts Consider a timeout for operations as fatal.
>>> > __failure_expire_time Amount of time before a failure is forgotten.
>>> > __independent_subtree Treat this and all children as an independent
>>> > subtree. __max_failures Maximum number of failures before returning a
>>> > failure to a status check.
>>> >
>>> > If I recall correctly you may need to tweak:
>>> >
>>> > shutdown_wait Wait X seconds for correct end of service shutdown
>>> > startup_wait Wait X seconds for correct end of service startup
>>> >
>>> > There can be problems relocating the DB if it takes too long to
>>> > start/shutdown. If you are having problems relocating with luci it may
>>> > be a good idea to test with:
>>> >
>>> > # clusvcadm -r <service name> -m <cluster node>
>>> >
>>> > -Ben
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > ----- "fosiul alam" < expertalert at gmail.com > wrote:
>>> >
>>> > > Hi
>>> > > I have 4 nodes cluster,
>>> > > It was running fine. but today one nodes is giving trouble
>>> > >
>>> > > From luci Gui interface, when i try to relocate service into this
>>> > node
>>> > > and trying to relocate from this nodes to another nodes
>>> > >
>>> > > from luci gui interface, its showing :
>>> > >
>>> > > Unable to retrieve batch 1908047789 status from
>>> > > beaver.domain.local:11111: clusvcadm start failed to start httpd1:
>>> > > Starting cluster service "httpd1" on node "http1.domain.local" --
>>> > You
>>> > > will be redirected in 5 seconds.
>>> > > also
>>> > >
>>> > > The ricci agent for this node is unresponsive. Node-specific
>>> > > information is not available at this time. :
>>> > >
>>> > > but ricci is running on problematic node ,
>>> > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101
>>> > >
>>> > > there is not any firewall running.
>>> > >
>>> > > iptables -L
>>> > > Chain INPUT (policy ACCEPT)
>>> > > target prot opt source destination
>>> > >
>>> > > Chain FORWARD (policy ACCEPT)
>>> > > target prot opt source destination
>>> > >
>>> > > Chain OUTPUT (policy ACCEPT)
>>> > > target prot opt source destination
>>> > >
>>> > > Chain RH-Firewall-1-INPUT (0 references)
>>> > > target prot opt source destination
>>> > >
>>> > > port 11111 is runningg
>>> > >
>>> > > netstat -an | grep 11111
>>> > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN
>>> > >
>>> > >
>>> > > but still ricci is very unstable , and i cant relocate any service
>>> > on
>>> > > this node or i cant relocate any service away from this node.
>>> > >
>>> > > from problematic node if i type this
>>> > >
>>> > > clustat
>>> > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010
>>> > > Member Status: Quorate
>>> > >
>>> > > Member Name ID Status
>>> > > ------ ---- ---- ------
>>> > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this
>>> > > server publicdns1.xxxx.local 2 Online, rgmanager
>>> > > http1.xxxx.local 3 Online, Local, rgmanager
>>> > > mail01.xxxxx.local 4 Online, rgmanager
>>> > >
>>> > > Service Name Owner (Last) State
>>> > > ------- ---- ----- ------ -----
>>> > > service:httpd1 mail01.xxxx.local started
>>> > > service:mysql-server http1.xxxx.local started -------------------
>>> > this
>>> > > is the problematic node
>>> > > service:public-dns publicdns1.xxxxxx.local started
>>> > >
>>> > > I cant move that service mysql-server from this node or cant
>>> > relocate
>>> > > any service on this node ..
>>> > > I am very confused.
>>> > >
>>> > > what shall i do to fix this issue ??
>>> > >
>>> > > thanks for your advise.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > -- Linux-cluster mailing list
>>> > > Linux-cluster at redhat.com
>>> > > https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >
>>> > -- Linux-cluster mailing list
>>> > Linux-cluster at redhat.com
>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>> >
>>> >
>>> > -- Linux-cluster mailing list
>>> > Linux-cluster at redhat.com
>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> -- Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>> -- Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100928/71ecbfc4/attachment.htm>


More information about the Linux-cluster mailing list