[Linux-cluster] Unable to obtain lock
Marc Grimme
grimme at atix.de
Thu Mar 22 08:17:04 UTC 2007
Hello,
again we had the same problem as stated in January. We installed the hotfix
but it didn't help.
Again the whole cluster freezed, no node was allowed to rejoin the
fencedomain.
Any ideas or do you need any more information?
Thanks and Regards Marc.
Mar 22 04:04:03 lilr623b clurgmgrd[12855]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:04:06 lilr623a clurgmgrd[20754]: <err> #50: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:04:31 lilr623e clurgmgrd[20331]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:04:33 lilr623b clurgmgrd[12855]: <err> #50: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:04:50 lilr623a clurgmgrd[20754]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:05:18 lilr623b clurgmgrd[12855]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:05:35 lilr623a clurgmgrd[20754]: <err> #50: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:06:03 lilr623b clurgmgrd[12855]: <err> #50: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:06:21 lilr623a clurgmgrd[20754]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:06:33 lilr623b clurgmgrd[12855]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 04:07:05 lilr623a clurgmgrd[20754]: <err> #50: Unable to obtain cluster
lock: Connection timed out
Mar 22 07:09:39 lilr623d kernel: CMAN: node lilr623f-ics0 has been removed
from the cluster : Missed too many heartbeats
Mar 22 07:09:39 lilr623c kernel: CMAN: node lilr623f-ics0 has been removed
from the cluster : Missed too many heartbeats
Mar 22 07:09:39 lilr623d kernel: dlm: lt_sharedroot: send_cluster_request to 3
state 1 recovery
Mar 22 07:10:00 lilr623d kernel: CMAN: node lilr623b-ics0 has been removed
from the cluster : Missed too many heartbeats
Mar 22 07:10:00 lilr623c kernel: CMAN: removing node lilr623b-ics0 from the
cluster : Missed too many heartbeats
Mar 22 07:10:05 lilr623c kernel: dlm: lt_sharedroot: dlm_dir_rebuild_local
failed -1
Mar 22 07:10:05 lilr623d kernel: dlm: lt_sharedroot: dlm_dir_rebuild_local
failed -1
Mar 22 07:10:05 lilr623c kernel: dlm: lt_scratch: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:05 lilr623d kernel: dlm: lt_scratch: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:10 lilr623c kernel: dlm: lt_products: restbl_rsb_update failed -1
Mar 22 07:10:10 lilr623d kernel: dlm: lt_products: restbl_rsb_update failed -1
Mar 22 07:10:11 lilr623c kernel: dlm: lt_P06user: dlm_dir_rebuild_local
failed -1
Mar 22 07:10:11 lilr623d kernel: dlm: lt_P06user: dlm_dir_rebuild_local
failed -1
Mar 22 07:10:11 lilr623c kernel: dlm: lt_P06user1: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:11 lilr623d kernel: dlm: lt_P06user1: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:15 lilr623d kernel: dlm: lt_P06sap: dlm_dir_rebuild_local
failed -1
Mar 22 07:10:15 lilr623d kernel: dlm: lt_P06origlogA: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:15 lilr623d kernel: dlm: lt_P06origlogB: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:16 lilr623c kernel: dlm: lt_P06sap: dlm_dir_rebuild_wait
failed -1
Mar 22 07:10:20 lilr623d kernel: dlm: lt_P06origlogC: dlm_dir_rebuild_wait
failed -1
Mar 22 07:10:21 lilr623c kernel: dlm: lt_P06origlogA: dlm_dir_rebuild_wait
failed -1
Mar 22 07:10:21 lilr623c kernel: dlm: lt_P06origlogB: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:22 lilr623c kernel: dlm: lt_P06origlogC: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:25 lilr623d kernel: dlm: lt_P06origlogD: dlm_dir_rebuild_wait
failed -1
Mar 22 07:10:25 lilr623d kernel: dlm: lt_P06mirrlogA: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:25 lilr623d kernel: dlm: lt_P06mirrlogB: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:27 lilr623c kernel: dlm: lt_P06origlogD: dlm_dir_rebuild_wait
failed -1
Mar 22 07:10:27 lilr623c kernel: dlm: lt_P06mirrlogA: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:28 lilr623c kernel: dlm: lt_P06mirrlogB: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:28 lilr623c kernel: dlm: lt_P06mirrlogC: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:29 lilr623c kernel: dlm: lt_P06mirrlogD: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:30 lilr623c kernel: dlm: lt_P06arch: restbl_rsb_update failed -1
Mar 22 07:10:30 lilr623c kernel: dlm: lt_P06data1: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:30 lilr623d kernel: dlm: lt_P06mirrlogC: dlm_dir_rebuild_wait
failed -1
Mar 22 07:10:30 lilr623d kernel: dlm: lt_P06mirrlogD: dlm_dir_rebuild_wait
failed 1
Mar 22 07:10:31 lilr623c kernel: dlm: lt_P06data2: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:35 lilr623c kernel: dlm: lt_P06data3: restbl_rsb_update failed -1
Mar 22 07:10:35 lilr623d kernel: dlm: lt_P06arch: restbl_rsb_update failed -1
Mar 22 07:10:35 lilr623c kernel: dlm: lt_P06data4: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:35 lilr623d kernel: dlm: lt_P06data1: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:36 lilr623d kernel: dlm: lt_P06data2: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:41 lilr623d kernel: dlm: lt_P06data3: restbl_rsb_update failed -1
Mar 22 07:10:41 lilr623d kernel: dlm: lt_P06data4: dlm_dir_rebuild_wait failed
1
Mar 22 07:10:41 lilr623c kernel: dlm: clvmd: dlm_dir_rebuild_wait failed -1
Mar 22 07:10:41 lilr623c kernel: dlm: Magma: dlm_dir_rebuild_wait failed 1
Mar 22 07:10:41 lilr623d kernel: dlm: clvmd: dlm_dir_rebuild_wait failed 1
Mar 22 07:10:42 lilr623d kernel: dlm: Magma: dlm_dir_rebuild_wait failed 1
Mar 22 07:11:05 lilr623c fenced[15490]: fencing deferred to lilr623a-ics0
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data3.2: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data3.0: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data3.2: jid=5:
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data2.0: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data1.2: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=5:
Looking at journal...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data1.2: jid=5:
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data1.0: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=5: Busy
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06arch.0: jid=5: Trying
to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=4:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06arch.0: jid=5:
Looking at journal...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=4:
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06mirrlogD.0: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogC.2: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data3.0: jid=5: Busy
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogC.2: jid=5:
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data3.0: jid=4:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogB.2: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06mirrlogC.0: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogB.2: jid=5:
Looking at journal...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data1.0: jid=5: Busy
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogA.2: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623d kernel: GFS: fsid=lilr623:lt_P06data1.0: jid=4:
Trying to acquire journal lock...
Mar 22 07:11:35 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogD.2: jid=5:
Trying to acquire journal lock...
....
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_products.2: jid=4:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06data3.0: jid=4:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623c kernel: lock_dlm: lm_dlm_cancel 1,2 flags 84
Mar 22 07:11:36 lilr623c kernel: lock_dlm: lm_dlm_cancel skip 1,2 flags 84
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogB.2: jid=5:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_scratch.2: jid=4: Busy
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_P06origlogB.2: jid=4:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623c kernel: GFS: fsid=lilr623:lt_P06mirrlogA.2: jid=4:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06origlogD.0: jid=4:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06mirrlogC.0: jid=4:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06origlogC.0: jid=4:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=5:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06user1.0: jid=5:
Acquiring the transaction lock...
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06user.0: jid=5: Done
Mar 22 07:11:36 lilr623d kernel: GFS: fsid=lilr623:lt_P06user.0: jid=4: Trying
to acquire journal lock...
Mar 22 07:11:37 lilr623a clurgmgrd[20754]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 07:11:37 lilr623e clurgmgrd[20331]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 07:11:37 lilr623e clurgmgrd[20331]: <err> #50: Unable to obtain cluster
lock: Connection timed out
Mar 22 07:11:37 lilr623e clurgmgrd[20331]: <err> #48: Unable to obtain cluster
lock: Connection timed out
Mar 22 07:11:37 lilr623d kernel: GFS: fsid=lilr623:lt_sharedroot.2: jid=5:
Acquiring the transaction lock...
Mar 22 07:11:37 lilr623d kernel: GFS: fsid=lilr623:lt_P06user.0: jid=4: Busy
Mar 22 07:11:37 lilr623c kernel: GFS: fsid=lilr623:lt_P06origlogD.2: jid=5:
Acquiring the transaction lock...
Mar 22 07:11:37 lilr623a clurgmgrd[20754]: <err> #50: Unable to obtain cluster
lock: Connection timed out
Mar 22 07:11:37 lilr623c kernel: GFS: fsid=lilr623:lt_P06data4.2: jid=4:
Acquiring the transaction lock...
Mar 22 07:11:37 lilr623e clurgmgrd[20331]: <err> #50: Unable to obtain cluster
lock: Connection timed out
Mar 22 07:11:37 lilr623d kernel: GFS: fsid=lilr623:lt_P06data2.0: jid=4:
Acquiring the transaction lock...
...
Mar 22 07:11:38 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=5: Done
Mar 22 07:11:38 lilr623d kernel: GFS: fsid=lilr623:lt_P06data4.0: jid=4:
Trying to acquire journal lock...
Mar 22 07:11:38 lilr623d kernel: GFS: fsid=lilr623:lt_P06user1.0: jid=4: Busy
Mar 22 07:11:38 lilr623d kernel: GFS: fsid=lilr623:lt_P06data1.0: jid=4:
Replayed 0 of 0 blocks
Mar 22 07:11:39 lilr623c kernel: GFS: fsid=lilr623:lt_P06origlogC.2: jid=4:
Trying to acquire journal lock...
Mar 22 07:11:39 lilr623a shutdown: shutting down for system reboot
Mar 22 07:11:39 lilr623a kernel: dlm: lt_products: restbl_rsb_update failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06origlogB: dlm_dir_rebuild_wait
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06origlogC: dlm_dir_rebuild_wait
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06mirrlogB: dlm_dir_rebuild_wait
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06mirrlogD: dlm_dir_rebuild_wait
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06data2: dlm_dir_rebuild_wait
failed -1
Mar 22 07:11:39 lilr623a kernel: dlm: lt_P06data3: restbl_rsb_update failed -1
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_P06data4.1: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_P06data1.1: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:39 lilr623d clurgmgrd[20148]: <info> State change: lilr623f-ics0
DOWN
Mar 22 07:11:39 lilr623e kernel: rh_lkid 2bd03c3
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_P06origlogD.1: jid=5:
Trying to acquire journal lock...
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_sharedroot.0: jid=4:
Busy
Mar 22 07:11:39 lilr623e kernel: lockstate 0
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_scratch.1: jid=5: Busy
Mar 22 07:11:39 lilr623a kernel: GFS: fsid=lilr623:lt_P06data4.1: jid=4: Busy
Mar 22 07:11:39 lilr623e kernel: rh_cmd 5
Mar 22 07:11:39 lilr623e kernel: nodeid 5
Mar 22 07:11:39 lilr623e kernel: dlm: Magma: reply from 2 no lock
Mar 22 07:11:39 lilr623e kernel: CMAN: node lilr623b-ics0 has been removed
from the cluster : Missed too many heartbeats
On Monday 29 January 2007 19:44:46 Lon Hohberger wrote:
> On Fri, 2007-01-26 at 19:28 +0100, Marc Grimme wrote:
> > On Friday 26 January 2007 19:15, Lon Hohberger wrote:
> > > On Fri, 2007-01-26 at 09:19 +0100, Marc Grimme wrote:
> > > > Hello,
> > > > yesterday we saw a clusterfreeze (which seems to come from the
> > > > rgmanager) with RHEL4/U4 GFS installed (see logs) consisting of 6
> > > > nodes x86_64 Architecture. After fencing one node the cluster came
> > > > back to live. Any idea what could have happend?
> > >
> > > Check 'dmesg' and 'cman_tool status'. Also look at /proc/slabinfo,
> > > specifically 'dlm_lkb' bits. There's a chance that you hit a bug
> > > that's already fixed. :)
> >
> > dlm_lkb 189628 195177 232 17 1 : tunables 120 60
> > 8 : slabdata 11481 11481 384
> > nodea
> > dlm_lkb 2074114 2077587 232 17 1 : tunables 120 60
> > 8 : slabdata 122211 122211 180
> > nodeb
> > dlm_lkb 454319 499392 232 17 1 : tunables 120 60
> > 8 : slabdata 29376 29376 0
> > nodec
> > dlm_lkb 242144 251719 232 17 1 : tunables 120 60
> > 8 : slabdata 14807 14807 480
> > noded
> > dlm_lkb 248672 286382 232 17 1 : tunables 120 60
> > 8 : slabdata 16846 16846 212
> > nodef
> > dlm_lkb 62934 62934 232 17 1 : tunables 120 60
> > 8 : slabdata 3702 3702 0
>
> You've hit "the bug".
>
> > > Need above information (and possibly more) to answer this.
> >
> > What more?? ;-)
>
> Nothing; test packages here:
>
> http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.i386.rpm
> http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.x86_64.rpm
> http://people.redhat.com/lhh/rgmanager-1.9.54-2.218112hf.src.rpm
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
--
Gruss / Regards,
Marc Grimme
Phone: +49-89 452 3538-14
http://www.atix.de/ http://www.open-sharedroot.org/
** Visit us at CeBIT 2007 in Hannover/Germany **
** in Hall 5, Booth G48/2 (15.-21. of March) **
**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany
Registergericht: Amtsgericht München
Registernummer: HRB 131682
USt.-Id.: DE209485962
Geschäftsführung: Marc Grimme, Mark Hlawatschek, Thomas Merz
More information about the Linux-cluster
mailing list