[Linux-cluster] Problems to start ony one cluster service (SOLVED but ...)
carlopmart
carlopmart at gmail.com
Thu Nov 29 14:51:52 UTC 2007
carlopmart wrote:
> carlopmart wrote:
>> Lon Hohberger wrote:
>>> On Tue, 2007-11-27 at 11:26 +0100, carlopmart wrote:
>>>> Hi all
>>>>
>>>> I have a very strange problem. I have configured three nodes under
>>>> RHCS on rhel5.1 servers. All works ok, except for one service that
>>>> never starts when rgmanager start-up. My cluster conf is:
>>>>
>>>> <?xml version="1.0"?>
>>>> <cluster alias="RhelXenCluster" config_version="17"
>>>> name="RhelXenCluster">
>>>> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>>> <clusternodes>
>>>> <clusternode name="rhelclu01.hpulabs.org"
>>>> nodeid="1" votes="1">
>>>> <fence>
>>>> <method name="1">
>>>> <device name="gnbd-fence"
>>>> nodename="rhelclu01.hpulabs.org"/>
>>>> </method>
>>>> </fence>
>>>> <multicast addr="239.192.75.55"
>>>> interface="eth0"/>
>>>> </clusternode>
>>>> <clusternode name="rhelclu02.hpulabs.org"
>>>> nodeid="2" votes="1">
>>>> <fence>
>>>> <method name="1">
>>>> <device name="gnbd-fence"
>>>> nodename="rhelclu02.hpulabs.org"/>
>>>> </method>
>>>> </fence>
>>>> <multicast addr="239.192.75.55"
>>>> interface="eth0"/>
>>>> </clusternode>
>>>> <clusternode name="rhelclu03.hpulabs.org"
>>>> nodeid="3" votes="1">
>>>> <fence>
>>>> <method name="1">
>>>> <device name="gnbd-fence"
>>>> nodename="rhelclu03.hpulabs.org"/>
>>>> </method>
>>>> </fence>
>>>> <multicast addr="239.192.75.55"
>>>> interface="xenbr0"/>
>>>> </clusternode>
>>>> </clusternodes>
>>>> <cman expected_votes="1" two_node="0">
>>>> <multicast addr="239.192.75.55"/>
>>>> </cman>
>>>> <fencedevices>
>>>> <fencedevice agent="fence_gnbd" name="gnbd-fence"
>>>> servers="rhelclu03.hpulabs.org"/>
>>>> </fencedevices>
>>>> <rm log_facility="local4" log_level="7">
>>>> <failoverdomains>
>>>> <failoverdomain name="PriCluster"
>>>> ordered="1" restricted="1">
>>>> <failoverdomainnode
>>>> name="rhelclu01.hpulabs.org" priority="1"/>
>>>> <failoverdomainnode
>>>> name="rhelclu02.hpulabs.org" priority="2"/>
>>>> </failoverdomain>
>>>> <failoverdomain name="SecCluster"
>>>> ordered="1" restricted="1">
>>>> <failoverdomainnode
>>>> name="rhelclu02.hpulabs.org" priority="1"/>
>>>> <failoverdomainnode
>>>> name="rhelclu01.hpulabs.org" priority="2"/>
>>>> </failoverdomain>
>>>> </failoverdomains>
>>>> <resources>
>>>> <ip address="172.25.50.10" monitor_link="1"/>
>>>> <ip address="172.25.50.11" monitor_link="1"/>
>>>> <ip address="172.25.50.12" monitor_link="1"/>
>>>> <ip address="172.25.50.13" monitor_link="1"/>
>>>> <ip address="172.25.50.14" monitor_link="1"/>
>>>> <ip address="172.25.50.15" monitor_link="1"/>
>>>> <ip address="172.25.50.16" monitor_link="1"/>
>>>> <ip address="172.25.50.17" monitor_link="1"/>
>>>> <ip address="172.25.50.18" monitor_link="1"/>
>>>> <ip address="172.25.50.19" monitor_link="1"/>
>>>> <ip address="172.25.50.20" monitor_link="1"/>
>>>> </resources>
>>>> <service autostart="1" domain="PriCluster"
>>>> name="dns-svc" recovery="relocate">
>>>> <ip ref="172.25.50.10">
>>>> <script
>>>> file="/data/cfgcluster/etc/init.d/named" name="named"/>
>>>> </ip>
>>>> </service>
>>>> <service autostart="1" domain="SecCluster"
>>>> name="mail-svc" recovery="relocate">
>>>> <ip ref="172.25.50.11">
>>>> <script
>>>> file="/data/cfgcluster/etc/init.d/postfix-cluster" name="postfix"/>
>>>> </ip>
>>>> </service>
>>>> <service autostart="1" domain="SecCluster"
>>>> name="rsync-svc" recovery="relocate">
>>>> <ip ref="172.25.50.13">
>>>> <script
>>>> file="/data/cfgcluster/etc/init.d/rsyncd" name="rsyncd"/>
>>>> </ip>
>>>> </service>
>>>> <service autostart="1" domain="PriCluster"
>>>> name="wwwsoft-svc" recovery="relocate">
>>>> <ip ref="172.25.50.14">
>>>> <script
>>>> file="/data/cfgcluster/etc/init.d/httpd-mirror" name="httpd-mirror"/>
>>>> </ip>
>>>> </service>
>>>> <service autostart="1" domain="SecCluster"
>>>> name="proxy-svc" recovery="relocate">
>>>> <ip ref="172.25.50.15">
>>>> <script
>>>> file="/data/cfgcluster/etc/init.d/squid" name="squid"/>
>>>> </ip>
>>>> </service>
>>>> </rm>
>>>> </cluster>
>>>>
>>>> The service that returns me errors and never starts when rgmanager
>>>> start-up is postfix-cluster. On maillog file I find this error:
>>>
>>>
>>>> Nov 26 11:27:31 rhelclu01 postfix[27959]: fatal: parameter
>>>> inet_interfaces: no local interface found for 172.25.50.11
>>>> Nov 26 11:27:43 rhelclu01 postfix[28313]: fatal:
>>>> /data/cfgcluster/etc/postfix-cluster/postfix-script: Permission denied
>>>
>>>> but thath's not true. If I start this service manually all works
>>>> ok. Postfix configuration it is ok, What can be the problem??? I
>>>> don't know why rgmanager dosen't config 172.25.50.11 address before
>>>> execute postfix-cluster service ....
>>>
>>> When you start it manually -- how?
>>> * add IP manually / running the script?
>>> * rg_test?
>>> * clusvcadm -e?
>>>
>>> -- Lon
>>
>> Another strange thing: at this morning this service is stopped, when I
>> try to start using clusvcadm returns this error:
>>
>> Nov 28 09:28:21 rhelclu01 clurgmgrd[1450]: <warning> #68: Failed to
>> start service:mail-svc; return value: 1
>> Nov 28 09:28:21 rhelclu01 clurgmgrd[1450]: <notice> Stopping service
>> service:mail-svc
>> Nov 28 09:28:22 rhelclu01 clurgmgrd: [1450]: <err> script:postfix:
>> stop of /data/cfgcluster/etc/init.d/postfix-cluster failed (returned 1)
>> Nov 28 09:28:22 rhelclu01 clurgmgrd[1450]: <notice> stop on script
>> "postfix" returned 1 (generic error)
>> Nov 28 09:28:22 rhelclu01 in.rdiscd[11610]: setsockopt
>> (IP_ADD_MEMBERSHIP): Address already in use
>> Nov 28 09:28:22 rhelclu01 in.rdiscd[11610]: Failed joining addresses
>> Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <notice> Service
>> service:mail-svc is recovering
>> Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <warning> #71: Relocating
>> failed service service:mail-svc
>> Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <notice> Stopping service
>> service:mail-svc
>>
>> I don't understand this. IP 172.25.50.11 isn't used by anyone ....
>>
>>
>
> Finally I have found where is the problem: I need to put
> alternate_config param under first postfix instance and now all works
> ok. Service starts, stops and relocate ok but I found a little problem:
> clurgmgrd doesn't checks the status of the service. If I remove status
> flag from init script for the resource, nothing occurs. Do I need to put
> any param on cluster.conf to check services every 1 min or 2???
>
> Thanks.
>
Please, any hints???
--
CL Martinez
carlopmart {at} gmail {d0t} com
More information about the Linux-cluster
mailing list