[Linux-cluster] clurgmgrd stops service without reason

Mark Reynolds mark at sparkyone.com
Thu Aug 10 11:35:13 UTC 2006


Hi,

Have you been able to resolve this issue? I have the exact same symptoms
on a RedHat cluster (rgmanger version 1.9.46).

I receive a message "<notice> stopping service fileserver" and the node
shutsdown and ends up rebooting as it cant unmount a partition.

What worries me is that this has happened 3 times in 2 weeks with no
obvous reason as the server is working fine up until that point.

The relevant section of my cluster.conf is

<service autostart="0" domain="main" name="fileservices">
               <fs device="/dev/mapper/livevg-data" force_fsck="1"
force_unmount="1" fsid="11439" fstype="ext3"
mountpoint="/mnt/live" name="live" options="noatime"
self_fence="1"/>
                <fs device="/dev/mapper/backupvg-data" force_fsck="1"
force_unmount="1" fsid="53676" fstype="ext3"
mountpoint="/mnt/backup" name="backup" options="noatime"
self_fence="1"/>
                 <ip address="192.168.11.253" monitor_link="1"/>
                  <ip address="192.168.1.253" monitor_link="1"/>
                    <script file="/etc/init.d/smb-rhcs" name="Samba"/>
                   <script file="/etc/init.d/nfs-rhcs" name="NFS"/>
     </service>

Any thoughts or updates greatly appreciated as this is occuring on a
production server.

Regards

Mark Reynolds




> > On Wed, 2006-08-02 at 13:03 +0200, Falk Hackenberger - MediaTransfer AG
> > Netresearch & Consulting wrote:
> >
> >>--snip--
> >>Aug  1 17:31:28 kain clurgmgrd: [4780]:  Executing
> >>/exports/imap/checkimapstartup.sh status
> >>Aug  1 17:31:28 kain clurgmgrd: [4780]:  Executing
>
>>/exports/subversion/etc/rc.d/init.d/svnserver status
> >>Aug  1 17:31:28 kain clurgmgrd: [4780]:  Checking 192.168.0.223,
> >>Level 0
> >>Aug  1 17:31:28 kain clurgmgrd: [4780]:  192.168.0.223 present on
> >>eth0
> >>Aug  1 17:31:28 kain clurgmgrd: [4780]:  Link for eth0: Detected
> >>Aug  1 17:31:28 kain clurgmgrd: [4780]:  Link detected on eth0
> >>Aug  1 17:31:37 kain clurgmgrd[4780]:  Stopping service storage
> >>--snap--
> >>
> >>how to say to clurgmgrd, that he should log the reason for stoping the
> >>service?
> >
> > Something must be returning an error code where it should not be; can
> > you post
your service XML blob?
>
>it is very long and a little bit complex as i know... ;-)
>
>recovery="restart">





More information about the Linux-cluster mailing list