[Linux-cluster] Clustat shows wrong service status
Lon Hohberger
lhh at redhat.com
Thu Feb 28 18:25:50 UTC 2008
On Thu, 2008-02-28 at 12:32 +0100, Agnieszka Kukałowicz wrote:
> >
> > And I don't have situation that cman_tool nodes says:
> > " Last fenced 2008-02-27 15:24:16 by override"
> >
>
> I did more tests to find the cause of the problem.
> I found that clustat has problem with "restricted" failover domain.
> I tested 2 examples of my configuration:
>
> 1. failover domain is "restricted"
>
> <rm>
> <failoverdomains>
> <failoverdomain name="VM_w1_failover" ordered="0" restricted="1">
> <failoverdomainnode name="w1.local" priority="1"/>
> </failoverdomain>
> <failoverdomain name="VM_w2_failover" ordered="0" restricted="1">
> <failoverdomainnode name="w2.local" priority="1"/>
>
> </failoverdomain>
> </failoverdomains>
> <resources/>
> <vm autostart="1" domain="VM_w1_failover" exclusive="0"
> name="VM_Work11_RHEL51" path="/virts/w11" recovery="restart"/>
> <vm autostart="1" domain="VM_w1_failover" exclusive="0"
> name="VM_Work12_RHEL51" path="/virts/w12" recovery="restart"/>
> <vm autostart="0" domain="VM_w1_failover" exclusive="0"
> name="VM_Work13_RHEL51" path="/virts/w13" recovery="disable"/>
> <vm autostart="1" domain="VM_w2_failover" exclusive="0"
> name="VM_Work21_RHEL51" path="/virts/w21" recovery="restart"/>
> <vm autostart="0" domain="VM_w2_failover" exclusive="0"
> name="VM_Work22_RHEL51" path="/virts/w22" recovery="disable"/>
> <vm autostart="0" domain="VM_w2_failover" exclusive="0"
> name="VM_Work23_RHEL51" path="/virts/w23" recovery="disable"/>
> </rm>
>
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> w2.local 1 Online, Local, rgmanager
> w1.local 2 Online, rgmanager
>
> Service Name Owner (Last) State
> ------- ---- ----- ------ -----
> vm:VM_Work11_RHEL51 w1.local started
> vm:VM_Work12_RHEL51 w1.local started
> vm:VM_Work13_RHEL51 (none) disabled
> vm:VM_Work21_RHEL51 w2.local started
> vm:VM_Work22_RHEL51 (none) disabled
> vm:VM_Work23_RHEL51 (none) disabled
>
> After power off node w2.local and fencing "w2.local" by "w1.local"
> clustat still shows the service vm:VM_Work21_RHEL51 is started on
> w2.local
>
Oh, so you had a restricted failover domain, and no nodes were online.
Looking at the code, it appears to be "correct weirdness" or perhaps
"known dysfunction":
http://sources.redhat.com/git/?p=cluster.git;a=blob;f=rgmanager/src/daemons/groups.c;h=a8325eec3425bbe124696bfe3dcd6f7ea1eebfea;hb=8e504af1adbadd2cb8fe7cab191d79a8d835540c#l742
/*
* TODO
* Mark a service as 'stopped' if no members in
its
* restricted fail-over domain are running.
*/
Why it occurs:
The service states are typically only altered by a node which is taking
action on a service. In this case, no nodes are online which are
allowed to act on the service - therefore, they do nothing. What needs
to happen in this case is:
* Check service failover domain config
* Check nodes online
* Mark service 'stopped' if no nodes are online capable of running
the service
The reason this hasn't been fixed is because it doesn't actually cause
any service-availability problems - it just causes wrong reporting.
Could you file a bugzilla? :)
-- Lon
More information about the Linux-cluster
mailing list