[Linux-cluster] clurgmgrd - <err> #48: Unable to obtain cluster lock: Connectiontimed out

Mon May 14 19:46:50 UTC 2007

Any new thoughts on this, is it a new bug, is it fixed with U5?  I have
a ticket open, but your insights on how probable this is a recurring bug
would be helpful.  Thanks.

On Fri, 2007-05-11 at 19:54 -0400, rhurst at bidmc.harvard.edu wrote:
> We are using RHEL 4 U4 with the GFS/CS that works for that release:
>  
> $ rpm -q rgmanager dlm dlm-kernel magma magma-plugins
> 
> rgmanager-1.9.54-1
> dlm-1.0.1-1
> dlm-kernel-2.6.9-44.9
> magma-1.0.6-0
> magma-plugins-1.0.9-0
> 
> Would the just-announced GFS/CS for U5 help any?  Looks like a lof
> issues were addressed.
>  
> Robert Hurst, Sr. Caché Administrator
> Beth Israel Deaconess Medical Center
> 1135 Tremont Street, REN-7
> Boston, Massachusetts   02120-2140
> 617-754-8754 · Fax: 617-754-8730 · Cell: 401-787-3154
> Any technology distinguishable from magic is insufficiently advanced.
> 
> 
> ______________________________________________________________________
> From: linux-cluster-bounces at redhat.com on behalf of Lon Hohberger
> Sent: Fri 5/11/2007 4:19 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] clurgmgrd - <err> #48: Unable to obtain
> cluster lock: Connectiontimed out
> 
> 
> On Mon, May 07, 2007 at 01:54:56PM -0400, rhurst at bidmc.harvard.edu
> wrote:
> > What could cause clurgmgrd fail like this?  If clurgmgrd has a
> hiccup
> > like this, is it supposed to shutdown its services?  Is there
> something
> > in our implementation that could have prevented this from shutting
> down?
> >
> > For unexplained reasons, we just had our CS service (WATSON) go down
> on
> > its own, and the syslog entry details the event as:
> >
> > May  7 13:18:39 db1 clurgmgrd[17888]: <err> #48: Unable to obtain
> > cluster lock: Connection timed out
> > May  7 13:18:41 db1 kernel: dlm: Magma: reply from 2 no lock
> > May  7 13:18:41 db1 kernel: dlm: reply
> > May  7 13:18:41 db1 kernel: rh_cmd 5
> > May  7 13:18:41 db1 kernel: rh_lkid 200242
> > May  7 13:18:41 db1 kernel: lockstate 2
> > May  7 13:18:41 db1 kernel: nodeid 0
> > May  7 13:18:41 db1 kernel: status 0
> > May  7 13:18:41 db1 kernel: lkid ee0388
> > May  7 13:18:41 db1 clurgmgrd[17888]: <notice> Stopping service
> WATSON
> 
> This usually is a dlm bug.  Once the DLM gets in to this state,
> rgmanager blows up.  What rgmanager are you using?
> 
> (There's only one lock per service; the complexity of the service
> doesn't matter...)
> 
> --
> Lon Hohberger - Software Engineer - Red Hat, Inc.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070514/f7423238/attachment.htm>