[Linux-cluster] clurgmgrd - <err> #48: Unable to obtain clusterlock: Connectiontimed out
Hagmann, Michael
Michael.Hagmann at hilti.com
Tue Jun 19 15:25:57 UTC 2007
Hi all
we just hit this Problem again:
Jun 18 08:03:08 lilr623a clurgmgrd[22152]: #48: Unable to obtain
cluster lock: Connection timed out
Jun 18 08:03:35 lilr623f clurgmgrd: [21651]: Executing
/usr/local/swadmin/caa/SAP/P06WD002 status
Jun 18 08:05:29 lilr623f clurgmgrd[21651]: #49: Failed getting status
for RG P06WD002
is there any open Bugzilla about this Problem?
what we also see that the Crash maybe is realated to the cron.daily
entries. Maybe some crontab entry trigger this dlmbug?
Here you can see the crontab, the cron.daily start at 08:02 the Cluster
stuck ag 08:03 ! Also the last time it was also the same time.
root at lilr623a:/tmp# cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 8 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly
root at lilr623a:/tmp# ls -l /etc/cron.daily
total 28
lrwxrwxrwx 1 root root 28 Oct 5 2006 00-logwatch ->
../log.d/scripts/logwatch.pl
-rwxr-xr-x 1 root root 418 Apr 14 2006 00-makewhatis.cron
-rwxr-xr-x 1 root root 276 Sep 28 2004 0anacron
-rwxr-xr-x 1 root root 180 Jul 13 2005 logrotate
-rwxr-xr-x 1 root root 48 Apr 9 2006 mcelog.cron
-rwxr-xr-x 1 root root 2133 Dec 1 2004 prelink
-rwxr-xr-x 1 root root 121 Aug 8 2005 slocate.cron
Thanks for your help
Mike
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Freitag, 11. Mai 2007 22:19
To: linux clustering
Subject: Re: [Linux-cluster] clurgmgrd - <err> #48: Unable to obtain
clusterlock: Connectiontimed out
On Mon, May 07, 2007 at 01:54:56PM -0400, rhurst at bidmc.harvard.edu
wrote:
> What could cause clurgmgrd fail like this? If clurgmgrd has a hiccup
> like this, is it supposed to shutdown its services? Is there
> something in our implementation that could have prevented this from
shutting down?
>
> For unexplained reasons, we just had our CS service (WATSON) go down
> on its own, and the syslog entry details the event as:
>
> May 7 13:18:39 db1 clurgmgrd[17888]: <err> #48: Unable to obtain
> cluster lock: Connection timed out May 7 13:18:41 db1 kernel: dlm:
> Magma: reply from 2 no lock May 7 13:18:41 db1 kernel: dlm: reply May
> 7 13:18:41 db1 kernel: rh_cmd 5 May 7 13:18:41 db1 kernel: rh_lkid
> 200242 May 7 13:18:41 db1 kernel: lockstate 2 May 7 13:18:41 db1
> kernel: nodeid 0 May 7 13:18:41 db1 kernel: status 0 May 7 13:18:41
> db1 kernel: lkid ee0388 May 7 13:18:41 db1 clurgmgrd[17888]: <notice>
> Stopping service WATSON
This usually is a dlm bug. Once the DLM gets in to this state,
rgmanager blows up. What rgmanager are you using?
(There's only one lock per service; the complexity of the service
doesn't matter...)
--
Lon Hohberger - Software Engineer - Red Hat, Inc.
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list