[Linux-cluster] Unable to obtain lock

Marc Grimme grimme at atix.de
Thu Mar 22 08:17:04 UTC 2007


Hello,
again we had the same problem as stated in January. We installed the hotfix 
but it didn't help.
Again the whole cluster freezed, no node was allowed to rejoin the 
fencedomain.
Any ideas or do you need any more information?
Thanks and Regards Marc.

Mar 22 04:04:03 lilr623b clurgmgrd[12855]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:04:06 lilr623a clurgmgrd[20754]: <err> #50: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:04:31 lilr623e clurgmgrd[20331]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:04:33 lilr623b clurgmgrd[12855]: <err> #50: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:04:50 lilr623a clurgmgrd[20754]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:05:18 lilr623b clurgmgrd[12855]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:05:35 lilr623a clurgmgrd[20754]: <err> #50: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:06:03 lilr623b clurgmgrd[12855]: <err> #50: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:06:21 lilr623a clurgmgrd[20754]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:06:33 lilr623b clurgmgrd[12855]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 04:07:05 lilr623a clurgmgrd[20754]: <err> #50: Unable to obtain cluster 
lock: Connection timed out
Mar 22 07:09:39 lilr623d kernel: CMAN: node lilr623f-ics0 has been removed 
from the cluster : Missed too many heartbeats
Mar 22 07:09:39 lilr623c kernel: CMAN: node lilr623f-ics0 has been removed 
from the cluster : Missed too many heartbeats
Mar 22 07:09:39 lilr623d kernel: dlm: lt_sharedroot: send_cluster_request to 3 
state 1 recovery
Mar 22 07:10:00 lilr623d kernel: CMAN: node lilr623b-ics0 has been removed 
from the cluster : Missed too many heartbeats
Mar 22 07:10:00 lilr623c kernel: CMAN: removing node lilr623b-ics0 from the 
cluster : Missed too many heartbeats
Mar 22 07:10:05 lilr623c kernel: dlm: lt_sharedroot: dlm_dir_rebuild_local 
failed -1
Mar 22 07:10:05 lilr623d kernel: dlm: lt_sharedroot: dlm_dir_rebuild_local 
failed -1
Mar 22 07:10:05 lilr623c kernel: dlm: lt_scratch: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:05 lilr623d kernel: dlm: lt_scratch: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:10 lilr623c kernel: dlm: lt_products: restbl_rsb_update failed -1
Mar 22 07:10:10 lilr623d kernel: dlm: lt_products: restbl_rsb_update failed -1
Mar 22 07:10:11 lilr623c kernel: dlm: lt_P06user: dlm_dir_rebuild_local 
failed -1
Mar 22 07:10:11 lilr623d kernel: dlm: lt_P06user: dlm_dir_rebuild_local 
failed -1
Mar 22 07:10:11 lilr623c kernel: dlm: lt_P06user1: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:11 lilr623d kernel: dlm: lt_P06user1: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:15 lilr623d kernel: dlm: lt_P06sap: dlm_dir_rebuild_local 
failed -1
Mar 22 07:10:15 lilr623d kernel: dlm: lt_P06origlogA: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:15 lilr623d kernel: dlm: lt_P06origlogB: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:16 lilr623c kernel: dlm: lt_P06sap: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:10:20 lilr623d kernel: dlm: lt_P06origlogC: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:10:21 lilr623c kernel: dlm: lt_P06origlogA: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:10:21 lilr623c kernel: dlm: lt_P06origlogB: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:22 lilr623c kernel: dlm: lt_P06origlogC: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:25 lilr623d kernel: dlm: lt_P06origlogD: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:10:25 lilr623d kernel: dlm: lt_P06mirrlogA: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:25 lilr623d kernel: dlm: lt_P06mirrlogB: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:27 lilr623c kernel: dlm: lt_P06origlogD: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:10:27 lilr623c kernel: dlm: lt_P06mirrlogA: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:28 lilr623c kernel: dlm: lt_P06mirrlogB: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:28 lilr623c kernel: dlm: lt_P06mirrlogC: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:29 lilr623c kernel: dlm: lt_P06mirrlogD: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:30 lilr623c kernel: dlm: lt_P06arch: restbl_rsb_update failed -1
Mar 22 07:10:30 lilr623c kernel: dlm: lt_P06data1: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:30 lilr623d kernel: dlm: lt_P06mirrlogC: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:10:30 lilr623d kernel: dlm: lt_P06mirrlogD: dlm_dir_rebuild_wait 
failed 1
Mar 22 07:10:31 lilr623c kernel: dlm: lt_P06data2: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:35 lilr623c kernel: dlm: lt_P06data3: restbl_rsb_update failed -1
Mar 22 07:10:35 lilr623d kernel: dlm: lt_P06arch: restbl_rsb_update failed -1
Mar 22 07:10:35 lilr623c kernel: dlm: lt_P06data4: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:35 lilr623d kernel: dlm: lt_P06data1: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:36 lilr623d kernel: dlm: lt_P06data2: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:41 lilr623d kernel: dlm: lt_P06data3: restbl_rsb_update failed -1
Mar 22 07:10:41 lilr623d kernel: dlm: lt_P06data4: dlm_dir_rebuild_wait failed 
1
Mar 22 07:10:41 lilr623c kernel: dlm: clvmd: dlm_dir_rebuild_wait failed -1
Mar 22 07:10:41 lilr623c kernel: dlm: Magma: dlm_dir_rebuild_wait failed 1
Mar 22 07:10:41 lilr623d kernel: dlm: clvmd: dlm_dir_rebuild_wait failed 1
Mar 22 07:10:42 lilr623d kernel: dlm: Magma: dlm_dir_rebuild_wait failed 1
Mar 22 07:11:05 lilr623c fenced[15490]: fencing deferred to lilr623a-ics0
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data3.2: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data3.0: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data3.2: jid=5: 
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data2.0: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data1.2: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=5: 
Looking at journal...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data1.2: jid=5: 
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data1.0: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=5: Busy
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06arch.0: jid=5: Trying 
to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=4: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06arch.0: jid=5: 
Looking at journal...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=4: 
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06mirrlogD.0: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogC.2: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data3.0: jid=5: Busy
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogC.2: jid=5: 
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data3.0: jid=4: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogB.2: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06mirrlogC.0: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogB.2: jid=5: 
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data1.0: jid=5: Busy
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogA.2: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data1.0: jid=4: 
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogD.2: jid=5: 
Trying to acquire journal lock...
....
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_products.2: jid=4: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06data3.0: jid=4: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623c kernel: lock_dlm: lm_dlm_cancel 1,2 flags 84
Mar 22 07:11:36 lilr623c kernel: lock_dlm: lm_dlm_cancel skip 1,2 flags 84
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogB.2: jid=5: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_scratch.2: jid=4: Busy
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_P06origlogB.2: jid=4: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogA.2: jid=4: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06origlogD.0: jid=4: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06mirrlogC.0: jid=4: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06origlogC.0: jid=4: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=5: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06user1.0: jid=5: 
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06user.0: jid=5: Done
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06user.0: jid=4: Trying 
to acquire journal lock...
Mar 22 07:11:37 lilr623a clurgmgrd[20754]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 07:11:37 lilr623e clurgmgrd[20331]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 07:11:37 lilr623e clurgmgrd[20331]: <err> #50: Unable to obtain cluster 
lock: Connection timed out
Mar 22 07:11:37 lilr623e clurgmgrd[20331]: <err> #48: Unable to obtain cluster 
lock: Connection timed out
Mar 22 07:11:37 lilr623d kernel: GFS: fsid=lilr623:lt_sharedroot.2: jid=5: 
Acquiring the transaction lock...
Mar 22 07:11:37 lilr623d kernel: GFS: fsid=lilr623:lt_P06user.0: jid=4: Busy
Mar 22 07:11:37 lilr623c kernel: GFS: fsid=lilr623:lt_P06origlogD.2: jid=5: 
Acquiring the transaction lock...
Mar 22 07:11:37 lilr623a clurgmgrd[20754]: <err> #50: Unable to obtain cluster 
lock: Connection timed out
Mar 22 07:11:37 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=4: 
Acquiring the transaction lock...
Mar 22 07:11:37 lilr623e clurgmgrd[20331]: <err> #50: Unable to obtain cluster 
lock: Connection timed out
Mar 22 07:11:37 lilr623d kernel: GFS: fsid=lilr623:lt_P06data2.0: jid=4: 
Acquiring the transaction lock...
...
Mar 22 07:11:38 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=5: Done
Mar 22 07:11:38 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=4: 
Trying to acquire journal lock...
Mar 22 07:11:38 lilr623d kernel: GFS: fsid=lilr623:lt_P06user1.0: jid=4: Busy
Mar 22 07:11:38 lilr623d kernel: GFS: fsid=lilr623:lt_P06data1.0: jid=4: 
Replayed 0 of 0 blocks
Mar 22 07:11:39 lilr623c kernel: GFS: fsid=lilr623:lt_P06origlogC.2: jid=4: 
Trying to acquire journal lock...
Mar 22 07:11:39 lilr623a shutdown: shutting down for system reboot
Mar 22 07:11:39 lilr623a kernel: dlm: lt_products: restbl_rsb_update failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06origlogB: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06origlogC: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06mirrlogB: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06mirrlogD: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06data2: dlm_dir_rebuild_wait 
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06data3: restbl_rsb_update failed -1
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_P06data4.1: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_P06data1.1: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:39 lilr623d clurgmgrd[20148]: <info> State change: lilr623f-ics0 
DOWN
Mar 22 07:11:39 lilr623e kernel: rh_lkid 2bd03c3
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_P06origlogD.1: jid=5: 
Trying to acquire journal lock...
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_sharedroot.0: jid=4: 
Busy
Mar 22 07:11:39 lilr623e kernel: lockstate 0
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_scratch.1: jid=5: Busy
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_P06data4.1: jid=4: Busy
Mar 22 07:11:39 lilr623e kernel: rh_cmd 5
Mar 22 07:11:39 lilr623e kernel: nodeid 5
Mar 22 07:11:39 lilr623e kernel: dlm: Magma: reply from 2 no lock
Mar 22 07:11:39 lilr623e kernel: CMAN: node lilr623b-ics0 has been removed 
from the cluster : Missed too many heartbeats

On Monday 29 January 2007 19:44:46 Lon Hohberger wrote:
> On Fri, 2007-01-26 at 19:28 +0100, Marc Grimme wrote:
> > On Friday 26 January 2007 19:15, Lon Hohberger wrote:
> > > On Fri, 2007-01-26 at 09:19 +0100, Marc Grimme wrote:
> > > > Hello,
> > > > yesterday we saw a clusterfreeze (which seems to come from the
> > > > rgmanager) with RHEL4/U4 GFS installed (see logs) consisting of 6
> > > > nodes x86_64 Architecture. After fencing one node the cluster came
> > > > back to live. Any idea what could have happend?
> > >
> > > Check 'dmesg' and 'cman_tool status'.  Also look at /proc/slabinfo,
> > > specifically 'dlm_lkb' bits.  There's a chance that you hit a bug
> > > that's already fixed. :)
> >
> > dlm_lkb           189628 195177    232   17    1 : tunables  120   60   
> > 8 : slabdata  11481  11481    384
> > nodea
> > dlm_lkb           2074114 2077587    232   17    1 : tunables  120   60  
> >  8 : slabdata 122211 122211    180
> > nodeb
> > dlm_lkb           454319 499392    232   17    1 : tunables  120   60   
> > 8 : slabdata  29376  29376      0
> > nodec
> > dlm_lkb           242144 251719    232   17    1 : tunables  120   60   
> > 8 : slabdata  14807  14807    480
> > noded
> > dlm_lkb           248672 286382    232   17    1 : tunables  120   60   
> > 8 : slabdata  16846  16846    212
> > nodef
> > dlm_lkb            62934  62934    232   17    1 : tunables  120   60   
> > 8 : slabdata   3702   3702      0
>
> You've hit "the bug".
>
> > > Need above information (and possibly more) to answer this.
> >
> > What more?? ;-)
>
> Nothing; test packages here:
>
> http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.i386.rpm
> http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.x86_64.rpm
> http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.src.rpm
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



-- 
Gruss / Regards,

Marc Grimme
Phone: +49-89 452 3538-14
http://www.atix.de/               http://www.open-sharedroot.org/

** Visit us at CeBIT 2007 in Hannover/Germany **
** in Hall 5, Booth G48/2  (15.-21. of March) **

**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany

Registergericht: Amtsgericht München
Registernummer: HRB 131682
USt.-Id.: DE209485962

Geschäftsführung: Marc Grimme, Mark Hlawatschek, Thomas Merz





More information about the Linux-cluster mailing list