[Linux-cluster] ricci is very unstable in one nodes

fosiul alam expertalert at gmail.com
Tue Sep 28 12:10:49 UTC 2010


hi ya

I reboot the whole cluster, every single server

when every one has been rebooted ..

every thing was looking al-right!!

[root at http1 ~]# clustat
Cluster Status for ng1 @ Tue Sep 28 13:03:45 2010
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 beaver.xx.local                  1 Online, rgmanager
 publicdns1.xxx.local              2 Online, rgmanager
 http1.xxx.local                   3 Online, Local, rgmanager
 mail01.xxx.local                  4 Online, rgmanager

 Service Name                   Owner (Last)                   State
 ------- ----                   ----- ------                   -----
 service:httpd1                 http1.xxx.local      started
------------------- this suppose to be here.
 service:mysql-server           mail01.xxx.local     started
 service:public-dns             publicdns1.xxx.local started

but now i was trying to relocate that service from http1.xxx.locate to
mail01.xxx.local


or even trying to access http1.xxx.local from luci server, same problem
again ......


so something else is upsetting.. dont know ...

is not there any way to debug or see what happening inside ??

Thanks for advise


On 28 September 2010 11:32, fosiul alam <expertalert at gmail.com> wrote:

> HI ya
>
> i found this interesting .. but dont know if its normal or not
>
> i typed this command in 3 cluster nodes
>
> tcpdump -i eth0 ip multicast
>
>
> and for some reason.. i am seeing same output in 3 server which is
>
> 11:26:13.700399 IP http1.xxxxx.local.5149 > 239.192.2.185.netsupport: UDP,
> length 118
>
>
> example.. Same output in every 3 server..
>
> is this normal output ?? ( here http1 is having the trouble to locate or
> relocate services in the cluster)
>
> so basically, what ever i am seeing in http1 server i am seeing the same
> out put on rest ..
>
> here 239.192.2.185 is  the multicast address of clsuter
>
> Thanks
> fosiul
>
>
> On 27 September 2010 18:37, fosiul alam <expertalert at gmail.com> wrote:
>
>> Hi, Addition to my previous email have a look to this one
>>
>> from http1 ( where i am trying to relocate a service)
>>
>>
>> [root at http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
>> Member http1.xxxx.local trying to enable service:httpd1...Success
>> Warning: service:httpd1 is now running on mail01.xxxx.local
>>
>> so, its saying its Success..
>> but it actually no..
>>
>> Thanks again
>>
>>
>>
>>
>> On 27 September 2010 18:31, fosiul alam <expertalert at gmail.com> wrote:
>>
>>> Hi
>>> Thanks for your advise,
>>> Currently i got this
>>>
>>>
>>> luci-0.12.2-12.el5.centos.1
>>> ricci-0.12.2-12.el5.centos.1
>>>
>>> is this the same rpm as
>>>
>>> luci-0.12.2-12.el5_5.4.i386.rpm  ?
>>> ricci-0.12.2-12.el5_5.4.i386.rpm  ?
>>>
>>> Thanks
>>>
>>>
>>>
>>> On 27 September 2010 17:55, Paul M. Dyer <pmdyer at ctgcentral2.com> wrote:
>>>
>>>> http://rhn.redhat.com/errata/RHBA-2010-0716.html
>>>>
>>>> It appears that this problem has been fixed in this errata.
>>>>
>>>> I installed the luci and ricci updates and did some lite testing.   So
>>>> far, the timeout 11111 error has not shown up.
>>>>
>>>> Paul
>>>>
>>>> ----- Original Message -----
>>>> From: "fosiul alam" <expertalert at gmail.com>
>>>> To: "linux clustering" <linux-cluster at redhat.com>
>>>> Sent: Monday, September 27, 2010 10:48:27 AM
>>>> Subject: Re: [Linux-cluster] ricci is very unstable in one nodes
>>>>
>>>> Hi
>>>> i am trying to patch ricci . let see how it goes
>>>>
>>>> but clusvcadm is failing as well
>>>>
>>>> [root at http1 ~]# clusvcadm -e httpd1 -m http1.xxxx.local
>>>> Member http1.xxxx.local trying to enable service:httpd1...Invalid
>>>> operation for resource
>>>>
>>>> here, http1 , where i was trying to run the service from luci
>>>>
>>>> what could be the problem ?
>>>> is there any way to find out if there is any problem with config ??
>>>>
>>>> On 27 September 2010 16:26, Ben Turner < bturner at redhat.com > wrote:
>>>>
>>>>
>>>> RHEL 5.6 hasn't been released yet so your package probably contains the
>>>> problem. I'm not sure how in sync Centos is with RHEL or if they patch
>>>> earlier so I cannot give you a time frame when it will be in Centos or
>>>> if they have already patched it. The problem in that BZ is more of an
>>>> annoyance, you usually just have to retry a time or two and it works. If
>>>> you can't get Luci working properly with your service at all you should
>>>> try enabling the service through the command line with clusvcadm -e. If
>>>> it is not working from the command line either then there is a problem
>>>> with the service config.
>>>>
>>>>
>>>>
>>>>
>>>> -Ben
>>>>
>>>>
>>>>
>>>>
>>>> ----- "fosiul alam" < expertalert at gmail.com > wrote:
>>>>
>>>> > Hi Ben
>>>> > Thanks
>>>> >
>>>> > I named this cluster as mysql-server but i have not installed mysql
>>>> > database in their yet
>>>> >
>>>> > and both luci and ricci on luci server and node1 is running this
>>>> > version
>>>> >
>>>> > luci-0.12.2-12.el5.centos.1
>>>> > ricci-0.12.2-12.el5.centos.1
>>>> >
>>>> >
>>>> > do you think this version has problem as well ??
>>>> >
>>>> > thanks for your help
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On 24 September 2010 15:33, Ben Turner < bturner at redhat.com > wrote:
>>>> >
>>>> >
>>>> > There is an issue with ricci timeouts that was fixed recently:
>>>> >
>>>> > https://bugzilla.redhat.com/show_bug.cgi?id=564490
>>>> >
>>>> > I'm not sure but you may be hitting that bug. Symptoms include: luci
>>>> > isn't able to get the status from the node, timeouts when querying
>>>> > ricci, etc. The fix should be released with 5.6
>>>> >
>>>> > On the mysql service there are some options that you need to set. Here
>>>> > are all the options available to that agent:
>>>> >
>>>> > mysql
>>>> > Defines a MySQL database server
>>>> >
>>>> > Attribute Description
>>>> > config_file Define configuration file
>>>> > listen_address Define an IP address for MySQL server. If the address
>>>> > is not given then first IP address from the service is taken.
>>>> > mysqld_options Other command-line options for mysqld
>>>> > name Name
>>>> > ref Reference to existing mysql resource in the resources section.
>>>> > service_name Inherit the service name.
>>>> > shutdown_wait Wait X seconds for correct end of service shutdown
>>>> > startup_wait Wait X seconds for correct end of service startup
>>>> > __enforce_timeouts Consider a timeout for operations as fatal.
>>>> > __failure_expire_time Amount of time before a failure is forgotten.
>>>> > __independent_subtree Treat this and all children as an independent
>>>> > subtree. __max_failures Maximum number of failures before returning a
>>>> > failure to a status check.
>>>> >
>>>> > If I recall correctly you may need to tweak:
>>>> >
>>>> > shutdown_wait Wait X seconds for correct end of service shutdown
>>>> > startup_wait Wait X seconds for correct end of service startup
>>>> >
>>>> > There can be problems relocating the DB if it takes too long to
>>>> > start/shutdown. If you are having problems relocating with luci it may
>>>> > be a good idea to test with:
>>>> >
>>>> > # clusvcadm -r <service name> -m <cluster node>
>>>> >
>>>> > -Ben
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > ----- "fosiul alam" < expertalert at gmail.com > wrote:
>>>> >
>>>> > > Hi
>>>> > > I have 4 nodes cluster,
>>>> > > It was running fine. but today one nodes is giving trouble
>>>> > >
>>>> > > From luci Gui interface, when i try to relocate service into this
>>>> > node
>>>> > > and trying to relocate from this nodes to another nodes
>>>> > >
>>>> > > from luci gui interface, its showing :
>>>> > >
>>>> > > Unable to retrieve batch 1908047789 status from
>>>> > > beaver.domain.local:11111: clusvcadm start failed to start httpd1:
>>>> > > Starting cluster service "httpd1" on node "http1.domain.local" --
>>>> > You
>>>> > > will be redirected in 5 seconds.
>>>> > > also
>>>> > >
>>>> > > The ricci agent for this node is unresponsive. Node-specific
>>>> > > information is not available at this time. :
>>>> > >
>>>> > > but ricci is running on problematic node ,
>>>> > > ricci 7324 0.0 0.1 58876 2932 ? S<s 14:40 0:00 ricci -u 101
>>>> > >
>>>> > > there is not any firewall running.
>>>> > >
>>>> > > iptables -L
>>>> > > Chain INPUT (policy ACCEPT)
>>>> > > target prot opt source destination
>>>> > >
>>>> > > Chain FORWARD (policy ACCEPT)
>>>> > > target prot opt source destination
>>>> > >
>>>> > > Chain OUTPUT (policy ACCEPT)
>>>> > > target prot opt source destination
>>>> > >
>>>> > > Chain RH-Firewall-1-INPUT (0 references)
>>>> > > target prot opt source destination
>>>> > >
>>>> > > port 11111 is runningg
>>>> > >
>>>> > > netstat -an | grep 11111
>>>> > > tcp 0 0 0.0.0.0:11111 0.0.0.0:* LISTEN
>>>> > >
>>>> > >
>>>> > > but still ricci is very unstable , and i cant relocate any service
>>>> > on
>>>> > > this node or i cant relocate any service away from this node.
>>>> > >
>>>> > > from problematic node if i type this
>>>> > >
>>>> > > clustat
>>>> > > Cluster Status for ng1 @ Thu Sep 23 20:24:02 2010
>>>> > > Member Status: Quorate
>>>> > >
>>>> > > Member Name ID Status
>>>> > > ------ ---- ---- ------
>>>> > > beaver.xxx.local 1 Online, rgmanager ::: luci is running from this
>>>> > > server publicdns1.xxxx.local 2 Online, rgmanager
>>>> > > http1.xxxx.local 3 Online, Local, rgmanager
>>>> > > mail01.xxxxx.local 4 Online, rgmanager
>>>> > >
>>>> > > Service Name Owner (Last) State
>>>> > > ------- ---- ----- ------ -----
>>>> > > service:httpd1 mail01.xxxx.local started
>>>> > > service:mysql-server http1.xxxx.local started -------------------
>>>> > this
>>>> > > is the problematic node
>>>> > > service:public-dns publicdns1.xxxxxx.local started
>>>> > >
>>>> > > I cant move that service mysql-server from this node or cant
>>>> > relocate
>>>> > > any service on this node ..
>>>> > > I am very confused.
>>>> > >
>>>> > > what shall i do to fix this issue ??
>>>> > >
>>>> > > thanks for your advise.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > -- Linux-cluster mailing list
>>>> > > Linux-cluster at redhat.com
>>>> > > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >
>>>> > -- Linux-cluster mailing list
>>>> > Linux-cluster at redhat.com
>>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> >
>>>> >
>>>> > -- Linux-cluster mailing list
>>>> > Linux-cluster at redhat.com
>>>> > https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>> -- Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>> -- Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100928/6e90451a/attachment.htm>


More information about the Linux-cluster mailing list