[Linux-cluster] Problems to start ony one cluster service (SOLVED but ...)

carlopmart carlopmart at gmail.com
Wed Nov 28 15:38:57 UTC 2007


carlopmart wrote:
> Lon Hohberger wrote:
>> On Tue, 2007-11-27 at 11:26 +0100, carlopmart wrote:
>>> Hi all
>>>
>>>   I have a very strange problem. I have configured three nodes under 
>>> RHCS on rhel5.1 servers. All works ok, except for one service that 
>>> never starts when rgmanager start-up. My cluster conf is:
>>>
>>> <?xml version="1.0"?>
>>> <cluster alias="RhelXenCluster" config_version="17" 
>>> name="RhelXenCluster">
>>>          <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>>          <clusternodes>
>>>                  <clusternode name="rhelclu01.hpulabs.org" nodeid="1" 
>>> votes="1">
>>>                          <fence>
>>>                                  <method name="1">
>>>                                          <device name="gnbd-fence" 
>>> nodename="rhelclu01.hpulabs.org"/>
>>>                                  </method>
>>>                          </fence>
>>>                          <multicast addr="239.192.75.55" 
>>> interface="eth0"/>
>>>                  </clusternode>
>>>                  <clusternode name="rhelclu02.hpulabs.org" nodeid="2" 
>>> votes="1">
>>>                          <fence>
>>>                                  <method name="1">
>>>                                          <device name="gnbd-fence" 
>>> nodename="rhelclu02.hpulabs.org"/>
>>>                                  </method>
>>>                          </fence>
>>>                          <multicast addr="239.192.75.55" 
>>> interface="eth0"/>
>>>                  </clusternode>
>>>                  <clusternode name="rhelclu03.hpulabs.org" nodeid="3" 
>>> votes="1">
>>>                          <fence>
>>>                                  <method name="1">
>>>                                          <device name="gnbd-fence" 
>>> nodename="rhelclu03.hpulabs.org"/>
>>>                                  </method>
>>>                          </fence>
>>>                          <multicast addr="239.192.75.55" 
>>> interface="xenbr0"/>
>>>                  </clusternode>
>>>          </clusternodes>
>>>          <cman expected_votes="1" two_node="0">
>>>                  <multicast addr="239.192.75.55"/>
>>>          </cman>
>>>          <fencedevices>
>>>                  <fencedevice agent="fence_gnbd" name="gnbd-fence" 
>>> servers="rhelclu03.hpulabs.org"/>
>>>          </fencedevices>
>>>          <rm log_facility="local4" log_level="7">
>>>                  <failoverdomains>
>>>                          <failoverdomain name="PriCluster" 
>>> ordered="1" restricted="1">
>>>                                  <failoverdomainnode 
>>> name="rhelclu01.hpulabs.org" priority="1"/>
>>>                                  <failoverdomainnode 
>>> name="rhelclu02.hpulabs.org" priority="2"/>
>>>                          </failoverdomain>
>>>                          <failoverdomain name="SecCluster" 
>>> ordered="1" restricted="1">
>>>                                  <failoverdomainnode 
>>> name="rhelclu02.hpulabs.org" priority="1"/>
>>>                                  <failoverdomainnode 
>>> name="rhelclu01.hpulabs.org" priority="2"/>
>>>                          </failoverdomain>
>>>                  </failoverdomains>
>>>                  <resources>
>>>             <ip address="172.25.50.10" monitor_link="1"/>
>>>                          <ip address="172.25.50.11" monitor_link="1"/>
>>>                          <ip address="172.25.50.12" monitor_link="1"/>
>>>                          <ip address="172.25.50.13" monitor_link="1"/>
>>>                          <ip address="172.25.50.14" monitor_link="1"/>
>>>                          <ip address="172.25.50.15" monitor_link="1"/>
>>>                          <ip address="172.25.50.16" monitor_link="1"/>
>>>                          <ip address="172.25.50.17" monitor_link="1"/>
>>>                          <ip address="172.25.50.18" monitor_link="1"/>
>>>                          <ip address="172.25.50.19" monitor_link="1"/>
>>>                          <ip address="172.25.50.20" monitor_link="1"/>
>>>                  </resources>
>>>                  <service autostart="1" domain="PriCluster" 
>>> name="dns-svc" recovery="relocate">
>>>                          <ip ref="172.25.50.10">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/named" name="named"/>
>>>                          </ip>
>>>                  </service>
>>>                  <service autostart="1" domain="SecCluster" 
>>> name="mail-svc" recovery="relocate">
>>>                          <ip ref="172.25.50.11">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/postfix-cluster" name="postfix"/>
>>>                          </ip>
>>>                  </service>
>>>                  <service autostart="1" domain="SecCluster" 
>>> name="rsync-svc" recovery="relocate">
>>>                          <ip ref="172.25.50.13">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/rsyncd" name="rsyncd"/>
>>>                          </ip>
>>>                  </service>
>>>                  <service autostart="1" domain="PriCluster" 
>>> name="wwwsoft-svc" recovery="relocate">
>>>                          <ip ref="172.25.50.14">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/httpd-mirror" name="httpd-mirror"/>
>>>                          </ip>
>>>                  </service>
>>>                  <service autostart="1" domain="SecCluster" 
>>> name="proxy-svc" recovery="relocate">
>>>                          <ip ref="172.25.50.15">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/squid" name="squid"/>
>>>                          </ip>
>>>                  </service>
>>>          </rm>
>>> </cluster>
>>>
>>>   The service that returns me errors and never starts when rgmanager 
>>> start-up is postfix-cluster. On maillog file I find this error:
>>
>>
>>>   Nov 26 11:27:31 rhelclu01 postfix[27959]: fatal: parameter 
>>> inet_interfaces: no local interface found for 172.25.50.11
>>> Nov 26 11:27:43 rhelclu01 postfix[28313]: fatal: 
>>> /data/cfgcluster/etc/postfix-cluster/postfix-script: Permission denied
>>
>>>   but thath's not true. If I start this service manually all works 
>>> ok. Postfix configuration it is ok, What can be the problem??? I 
>>> don't know why rgmanager dosen't config 172.25.50.11 address before 
>>> execute postfix-cluster service ....
>>
>> When you start it manually -- how?
>> * add IP manually / running the script?
>> * rg_test?
>> * clusvcadm -e?
>>
>> -- Lon
> 
> Another strange thing: at this morning this service is stopped, when I 
> try to start using clusvcadm returns this error:
> 
> Nov 28 09:28:21 rhelclu01 clurgmgrd[1450]: <warning> #68: Failed to 
> start service:mail-svc; return value: 1
> Nov 28 09:28:21 rhelclu01 clurgmgrd[1450]: <notice> Stopping service 
> service:mail-svc
> Nov 28 09:28:22 rhelclu01 clurgmgrd: [1450]: <err> script:postfix: stop 
> of /data/cfgcluster/etc/init.d/postfix-cluster failed (returned 1)
> Nov 28 09:28:22 rhelclu01 clurgmgrd[1450]: <notice> stop on script 
> "postfix" returned 1 (generic error)
> Nov 28 09:28:22 rhelclu01 in.rdiscd[11610]: setsockopt 
> (IP_ADD_MEMBERSHIP): Address already in use
> Nov 28 09:28:22 rhelclu01 in.rdiscd[11610]: Failed joining addresses
> Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <notice> Service 
> service:mail-svc is recovering
> Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <warning> #71: Relocating 
> failed service service:mail-svc
> Nov 28 09:28:32 rhelclu01 clurgmgrd[1450]: <notice> Stopping service 
> service:mail-svc
> 
>  I don't understand this. IP 172.25.50.11 isn't used by anyone ....
> 
> 

Finally I have found where is the problem: I need to put alternate_config param 
under first postfix instance and now all works ok. Service starts, stops and 
relocate ok but I found a little problem: clurgmgrd doesn't checks the status of 
the service. If I remove status flag from init script for the resource, nothing 
occurs. Do I need to put any param on cluster.conf to check services every 1 min 
or 2???

Thanks.

-- 
CL Martinez
carlopmart {at} gmail {d0t} com




More information about the Linux-cluster mailing list