[Linux-cluster] clurgmgrd stops service without reason

Thu Aug 10 20:42:41 UTC 2006

On Thu, 2006-08-10 at 12:35 +0100, Mark Reynolds wrote:
> Hi,
> 
> Have you been able to resolve this issue? I have the exact same symptoms
> on a RedHat cluster (rgmanger version 1.9.46).
> 
> I receive a message "<notice> stopping service fileserver" and the node
> shutsdown and ends up rebooting as it cant unmount a partition.
> 
> What worries me is that this has happened 3 times in 2 weeks with no
> obvous reason as the server is working fine up until that point.
> 
> The relevant section of my cluster.conf is
> 
> <service autostart="0" domain="main" name="fileservices">
>                <fs device="/dev/mapper/livevg-data" force_fsck="1"
> force_unmount="1" fsid="11439" fstype="ext3"
> mountpoint="/mnt/live" name="live" options="noatime"
> self_fence="1"/>
>                 <fs device="/dev/mapper/backupvg-data" force_fsck="1"
> force_unmount="1" fsid="53676" fstype="ext3"
> mountpoint="/mnt/backup" name="backup" options="noatime"
> self_fence="1"/>
>                  <ip address="192.168.11.253" monitor_link="1"/>
>                   <ip address="192.168.1.253" monitor_link="1"/>
>                     <script file="/etc/init.d/smb-rhcs" name="Samba"/>
>                    <script file="/etc/init.d/nfs-rhcs" name="NFS"/>
>      </service>
> 
> Any thoughts or updates greatly appreciated as this is occuring on a
> production server.

Well, your log messages and XML don't match.

There's a recent bugzilla noting that rgmanager lacks sufficient error
reporting for several resource agents.

I will make a couple of updates to the resource agents shortly (e.g.
today or tomorrow), and you can drop them in (on an already-running
cluster, without restarting rgmanager).  It should, then, provide you
the information as to what part is failing.  I would suspect that it is
either the Samba script or the NFS script that is returning an error,
based on the previously noted log messages.

-- Lon