[Linux-cluster] why all services stops when a node reboots?
ESGLinux
esggrupos at gmail.com
Fri Feb 13 10:44:35 UTC 2009
hello all
following with the problem, anyone can explain this:
The commands are run all in aprox 1 minute:
disable the service
[root at NODE2 log]# clusvcadm -d BBDD
Local machine disabling service:BBDD...Yes
enable the service
[root at NODE2 log]# clusvcadm -e BBDD
Local machine trying to enable service:BBDD...Success
service:BBDD is now running on node2
its ok, the service is running in node2, try to relocate to node1
root at NODE2 log]# clusvcadm -r BBDD -m node1
Trying to relocate service:BBDD to node1...Success
it works!!! fine, try to relocate again to node2
service:BBDD is now running on node1
[root at NODE2 log]# clusvcadm -r BBDD -m node2
Trying to relocate service:BBDD to node2...Success
it works again !!! I cant believe it. Try to relocate to node1 again
service:BBDD is now running on node2
[root at NODE2 log]# clusvcadm -r BBDD -m node1
Trying to relocate service:BBDD to node1...Failure
Opps!! it fails!!! Why? why 30 secs before it works and now it fails?
In this situation all I can do is enable an disable the service again to get
it works. It never gets up automatically...
[root at NODE2 log]# clusvcadm -d BBDD
Local machine disabling service:BBDD...Yes
[root at NODE2 log]# clusvcadm -e BBDD
Local machine trying to enable service:BBDD...Success
service:BBDD is now running on node2
Any explanation for this behaviour???
I´m complety astonished :-(
TIA
ESG
2009/2/13 ESGLinux <esggrupos at gmail.com>
> More clues,
>
> using system-config-cluster
>
> When I try to run a service in state failed I always get an error.
> I have tu disable the service, to get disabled state. With this state I can
> restart the services.
>
> I think I have a problem with the relocate because I cant do it nor with
> luci nor with system-config-cluster nor with clusvadm
>
> I always get error when i try this
>
> greetings
>
> ESG
>
>
> 2009/2/13 ESGLinux <esggrupos at gmail.com>
>
>> Hello,
>>
>> The services run ok on node1. If I halt node2 and try to run the services
>> the run ok on node1.
>> If I run the services without cluster they also run ok.
>>
>> I have eliminated the HTTP services and I have left the service BBDD to
>> debug the problem. Here is the log when the service is running on node2 and
>> node1 comes up:
>>
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering GATHER state from
>> 11.
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Creating commit token because
>> I
>> am
>> the rep.
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Saving state aru 1a high seq
>> receiv
>> ed 1a
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Storing new sequence id for
>> ring
>> 17
>> f4
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering COMMIT state.
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering RECOVERY state.
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [0] member
>> 192.168.1.185:
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep
>> 192.168.
>> 1.185
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 1a high delivered 1a
>> received
>> f
>> lag 1
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] position [1] member
>> 192.168.1.188:
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] previous ring seq 6128 rep
>> 192.168.
>> 1.188
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] aru 9 high delivered 9
>> received
>> fla
>> g 1
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Did not need to originate any
>> messa
>> ges in recovery.
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] Sending initial ORF token
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration:
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185)
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left:
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined:
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] CLM CONFIGURATION CHANGE
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] New Configuration:
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.185)
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188)
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Left:
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] Members Joined:
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] r(0) ip(192.168.1.188)
>> Feb 13 09:16:00 NODE2 openais[3326]: [SYNC ] This node is within the
>> primary component and will provide service.
>> Feb 13 09:16:00 NODE2 openais[3326]: [TOTEM] entering OPERATIONAL state.
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message
>> 192.168.1.185
>> Feb 13 09:16:00 NODE2 openais[3326]: [CLM ] got nodejoin message
>> 192.168.1.188
>> Feb 13 09:16:00 NODE2 openais[3326]: [CPG ] got joinlist message from
>> node 2
>> Feb 13 09:16:03 NODE2 kernel: dlm: connecting to 1
>> Feb 13 09:16:24 NODE2 clurgmgrd[4001]: <notice> Relocating service:BBDD to
>> better node node1
>> Feb 13 09:16:24 NODE2 clurgmgrd[4001]: <notice> Stopping service
>> service:BBDD
>> Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: <err> Stopping Service mysql:mydb
>> > Failed - Application Is Still Running
>> Feb 13 09:16:25 NODE2 clurgmgrd: [4001]: <err> Stopping Service mysql:mydb
>> > Failed
>> Feb 13 09:16:25 NODE2 clurgmgrd[4001]: <notice> stop on mysql "mydb"
>> returned 1 (generic error)
>> Feb 13 09:16:25 NODE2 avahi-daemon[3872]: Withdrawing address record for
>> 192.168.1.183 on eth0.
>> Feb 13 09:16:35 NODE2 clurgmgrd[4001]: <crit> #12: RG service:BBDD failed
>> to stop; intervention required
>> Feb 13 09:16:35 NODE2 clurgmgrd[4001]: <notice> Service service:BBDD is
>> failed
>> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <warning> #70: Failed to relocate
>> service:BBDD; restarting locally
>> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <err> #43: Service service:BBDD has
>> failed; can not start.
>> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <alert> #2: Service service:BBDD
>> returned failure code. Last Owner: node2
>> Feb 13 09:16:36 NODE2 clurgmgrd[4001]: <alert> #4: Administrator
>> intervention required.
>>
>>
>> As you can see in the message "Relocating service:BBDD to better node
>> node1"
>>
>> But it fails
>>
>> Another error that appears frecuently in my logs is the next:
>>
>> <err> Checking Existence Of File /var/run/cluster/mysql/mysql:mydb.pid
>> [mysql:mydb] > Failed - File Doesn't Exist
>>
>> I dont know if this is important. but I think this makes the message err>
>> Stopping Service mysql:mydb > Failed - Application Is Still Running and this
>> makes the service fails (I´m just guessing...)
>>
>> Any idea?
>>
>>
>> ESG
>>
>>
>> 2009/2/12 rajveer singh <torajveersingh at gmail.com>
>>
>>> Hi,
>>>
>>> Ok, perhaps there is some problem with the services on node1 , so, are
>>> you able to run these services on node1 without cluster. You first stop the
>>> cluster, and try to run these services on node1.
>>>
>>> It should run.
>>>
>>> Re,
>>> Rajveer Singh
>>>
>>> 2009/2/13 ESGLinux <esggrupos at gmail.com>
>>>
>>> Hello,
>>>>
>>>> Thats what I want, when node1 comes up I want to relocate to node1 but
>>>> what I get is all my services stoped and in failed state.
>>>>
>>>> With my configuration I expect to have the services running on node1.
>>>>
>>>> Any idea about this behaviour?
>>>>
>>>> Thanks
>>>>
>>>> ESG
>>>>
>>>>
>>>> 2009/2/12 rajveer singh <torajveersingh at gmail.com>
>>>>
>>>>
>>>>>
>>>>> 2009/2/12 ESGLinux <esggrupos at gmail.com>
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I´m testing a cluster using luci as admin tool. I have configured 2
>>>>>> nodes with 2 services http + mysql. This configuration works almost fine. I
>>>>>> have the services running on the node1
>>>>>> and y reboot this node1. Then the services relocates to node2 and all
>>>>>> contnues working but, when the node1 goes up all the services stops.
>>>>>>
>>>>>> I think that the node1, when comes alive, tries to run the services
>>>>>> and that makes the services stops, can it be true? I think node1 should not
>>>>>> start anything because the services are running in node2.
>>>>>>
>>>>>> Perphaps is a problem with the configuration, perhaps with fencing (i
>>>>>> have not configured fencing at all)
>>>>>>
>>>>>> here is my cluster.conf. Any idea?
>>>>>>
>>>>>> Thanks in advace
>>>>>>
>>>>>> ESG
>>>>>>
>>>>>>
>>>>>> <?xml version="1.0"?>
>>>>>> <cluster alias="MICLUSTER" config_version="29" name="MICLUSTER">
>>>>>> <fence_daemon clean_start="0" post_fail_delay="0"
>>>>>> post_join_delay="3"/>
>>>>>> <clusternodes>
>>>>>> <clusternode name="node1" nodeid="1" votes="1">
>>>>>> <fence/>
>>>>>> </clusternode>
>>>>>> <clusternode name="node2" nodeid="2" votes="1">
>>>>>> <fence/>
>>>>>> </clusternode>
>>>>>> </clusternodes>
>>>>>> <cman expected_votes="1" two_node="1"/>
>>>>>> <fencedevices/>
>>>>>> <rm>
>>>>>> <failoverdomains>
>>>>>> <failoverdomain name="DOMINIOFAIL"
>>>>>> nofailback="0" ordere
>>>>>> d="1" restricted="1">
>>>>>> * <failoverdomainnode name="node1"
>>>>>> priority="1"/>
>>>>>> * * <failoverdomainnode name="node2"
>>>>>> priority="2"/>
>>>>>> * </failoverdomain>
>>>>>> </failoverdomains>
>>>>>> <resources>
>>>>>> <ip address="192.168.1.183" monitor_link="1"/>
>>>>>> </resources>
>>>>>> <service autostart="1" domain="DOMINIOFAIL"
>>>>>> exclusive="0" name="
>>>>>> HTTP" recovery="relocate">
>>>>>> <apache config_file="conf/httpd.conf"
>>>>>> name="http" server
>>>>>> _root="/etc/httpd" shutdown_wait="0"/>
>>>>>> <ip ref="192.168.1.183"/>
>>>>>> </service>
>>>>>> <service autostart="1" domain="DOMINIOFAIL"
>>>>>> exclusive="0" name="
>>>>>> BBDD" recovery="relocate">
>>>>>> <mysql config_file="/etc/my.cnf"
>>>>>> listen_address="192.168
>>>>>> .1.183" name="mydb" shutdown_wait="0"/>
>>>>>> <ip ref="192.168.1.183"/>
>>>>>> </service>
>>>>>> </rm>
>>>>>> </cluster>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>>
>>>>>
>>>>> Hi ESG,
>>>>>
>>>>> Offcoures, as you have defined the priority of node1 as 1 and node2 as
>>>>> 2, so node1 is having more priority, so whenever it will be up, it will try
>>>>> to run the service on itself and so it will relocate the service from node2
>>>>> to node1.
>>>>>
>>>>>
>>>>> Re,
>>>>> Rajveer Singh
>>>>>
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090213/58186b29/attachment.htm>
More information about the Linux-cluster
mailing list