[Linux-cluster] kernel panic about lock_dlm

孙俊伟 sunjw at onewaveinc.com
Mon Apr 3 09:51:36 UTC 2006


Hi, everyone

I use kernel 2.6.15-rc7 and the latest STABLE cvs branch of GFS 
when the newest kernel is 2.6.15-rc7。
I've started a GFS cluster with 4 nodes, but after about 4 days, 
the cluster did not work.I found the /var/log/messages as follows:
<--
Mar 28 15:31:29 nd05 kernel: d 1 locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 update remastered resources
Mar 28 15:31:29 nd05 kernel: gfs-sda1 updated 0 resources
Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuild locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuilt 0 locks
Mar 28 15:31:29 nd05 kernel: gfs-sda1 recover event 11 done
Mar 28 15:31:29 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 8,11,11
Mar 28 15:31:29 nd05 kernel: gfs-sda1 process held requests
Mar 28 15:31:29 nd05 kernel: gfs-sda1 processed 0 requests
Mar 28 15:31:29 nd05 kernel: gfs-sda1 resend marked requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 11 finished
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 1,0,0 ids 11,11,11
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,1,0 ids 11,14,11
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move use event 14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 add node 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 total nodes 4
Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuild resource directory
Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuilt 1552 resources
Mar 28 15:31:30 nd05 kernel: gfs-sda1 purge requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 purged 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 mark waiting requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 marked 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 done
Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 11,14,14
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process held requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 processed 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resend marked requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests
Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 finished
Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id 9190386 state 0
Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2
Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id eab0065 state 0
Mar 28 15:31:30 nd05 kernel: gfs-sda1 unlock fb040350 no id
Mar 28 15:31:30 nd05 kernel: recovery_done jid 3 msg 309 a
Mar 28 15:31:30 nd05 kernel: 3961 recovery_done nodeid 4 flg 18
Mar 28 15:31:30 nd05 kernel: 3977 pr_start last_stop 3 last_start 4 last_finish 3
Mar 28 15:31:31 nd05 kernel: 3977 pr_start count 3 type 3 event 4 flags 21a
Mar 28 15:31:31 nd05 kernel: 3977 pr_start 4 done 1
Mar 28 15:31:31 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13415b4b id 163005c 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13425b42 id 180002f 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13435b39 id 1a00360 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13445b30 id 1760186 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13455b27 id 17a038b 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13465b1e id 15a01a8 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13475b15 id 1910380 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13485b0c id 1880309 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0
Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0
Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13605a34 id 16f00aa 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13615a2b id 17400e1 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13625a22 id 16b03c1 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13635a19 id 16b03ad 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13645a10 id 17e03d4 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13655a07 id 18202c0 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136659fe id 170036c 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136759f5 id 155031c 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136859ec id 1660212 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136959e3 id 15c0114 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136a59da id 15a038f 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136b59d1 id 17600bb 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136c59c8 id 1a20336 3,0
Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136d59bf id 171003c 3,0
Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136e59b6 id 1500008 3,0
Mar 28 15:31:32 nd05 kernel: 3976 pr_start last_stop 4 last_start 9 last_finish 4
Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 4 type 2 event 9 flags 21a
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,136f59ad id 15e026f 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,137059a4 id 170017e 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1371599b id 16b01e3 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13725992 id 18000a2 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13735989 id 177017c 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13745980 id 16d035a 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13755977 id 18102d6 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1376596e id 1740020 3,0
Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13775965 id 1780207 3,0
Mar 28 15:31:33 nd05 kernel: 3976 pr_start 9 done 1
Mar 28 15:31:33 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:33 nd05 kernel: 3976 pr_start last_stop 9 last_start 10 last_finish 9
Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 3 type 3 event 10 flags 21a
Mar 28 15:31:33 nd05 kernel: 3976 pr_start 10 done 1
Mar 28 15:31:33 nd05 kernel: 3977 pr_finish flags 1a
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,370232 id 23a010e 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,380229 id 2630143 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,390220 id 29f0338 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3a0217 id 2850133 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3b020e id 268035b 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3c0205 id 2710344 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3d01fc id 27701f4 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3e01f3 id 28203f7 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3f01ea id 236011f 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4001e1 id 25e0387 3,0
Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4101d8 id 2810157 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4201cf id 248035a 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4301c6 id 24d0297 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4401bd id 2920280 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4501b4 id 267000b 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4601ab id 263012c 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4701a2 id 2930281 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,480199 id 28e028d 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,490190 id 243031a 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4a0187 id 259000d 3,0
Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4b017e id 2650370 3,0
Mar 28 15:31:35 nd05 kernel: 3976 pr_start last_stop 10 last_start 15 last_finish 10
Mar 28 15:31:35 nd05 kernel: 3976 pr_start count 4 type 2 event 15 flags 21a
Mar 28 15:31:35 nd05 kernel: 3976 pr_start 15 done 1
Mar 28 15:31:35 nd05 kernel: 3976 pr_finish flags 1a
Mar 28 15:31:35 nd05 kernel: 
Mar 28 15:31:35 nd05 kernel: lock_dlm:  Assertion failed on line 357 of file /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c
Mar 28 15:31:35 nd05 kernel: lock_dlm:  assertion:  "!error"
Mar 28 15:31:35 nd05 kernel: lock_dlm:  time = 79185725
Mar 28 15:31:35 nd05 kernel: gfs-sda1: error=-22 num=3,133b5b81 lkf=9 flags=84
Mar 28 15:31:35 nd05 kernel: 
Mar 28 15:31:37 nd05 kernel: ------------[ cut here ]------------
Mar 28 15:31:37 nd05 kernel: kernel BUG at /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c:357!
Mar 28 15:31:37 nd05 kernel: invalid operand: 0000 [#1]
Mar 28 15:31:37 nd05 kernel: SMP 
Mar 28 15:31:37 nd05 kernel: Modules linked in: lock_dlm dlm cman gfs lock_harness ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msgha
ndler binfmt_misc dm_mirror dm_round_robin dm_multipath dm_mod video thermal processor fan button battery ac uhci_hcd usbcore hw_random shpchp
 pci_hotplug e1000 bonding qla2300 qla2xxx scsi_transport_fc sd_mod
Mar 28 15:31:37 nd05 kernel: CPU:    1
Mar 28 15:31:37 nd05 kernel: EIP:    0060:[<f89e9556>]    Not tainted VLI
Mar 28 15:31:37 nd05 kernel: EFLAGS: 00010282   (2.6.15-rc7smp) 
Mar 28 15:31:37 nd05 kernel: EIP is at do_dlm_unlock+0x8f/0xa4 [lock_dlm]
Mar 28 15:31:37 nd05 kernel: eax: 00000004   ebx: f560c180   ecx: f5cf7f10   edx: f89edf11
Mar 28 15:31:37 nd05 kernel: esi: ffffffea   edi: f8a7f000   ebp: f8a61580   esp: f5cf7f0c
Mar 28 15:31:37 nd05 kernel: ds: 007b   es: 007b   ss: 0068
Mar 28 15:31:37 nd05 kernel: Process gfs_glockd (pid: 3979, threadinfo=f5cf6000 task=f6735030)
Mar 28 15:31:37 nd05 kernel: Stack: f89edf11 f8a7f000 f55517b0 f89e97f0 f560c180 f8a3c64f f560c180 00000003 
Mar 28 15:31:37 nd05 kernel:        f55517d4 f8a329d8 f8a7f000 f560c180 00000003 f55517b0 f8a61580 f55517b0 
Mar 28 15:31:37 nd05 kernel:        f8a7f000 f8a31f28 f55517b0 f55517b0 00000001 f8a31fdc d82c34c0 f55517b0 
Mar 28 15:31:37 nd05 kernel: Call Trace:
Mar 28 15:31:37 nd05 kernel:  [<f89e97f0>] lm_dlm_unlock+0x19/0x20 [lock_dlm]
Mar 28 15:31:37 nd05 kernel:  [<f8a3c64f>] gfs_lm_unlock+0x2c/0x43 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a329d8>] gfs_glock_drop_th+0xe8/0x122 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a31f28>] rq_demote+0x76/0x92 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a31fdc>] run_queue+0x54/0xb5 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a320f4>] unlock_on_glock+0x1d/0x24 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a34013>] gfs_reclaim_glock+0xbd/0x135 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<f8a28734>] gfs_glockd+0x3a/0xe3 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<c0116f3d>] default_wake_function+0x0/0x12
Mar 28 15:31:37 nd05 kernel:  [<c010328a>] ret_from_fork+0x6/0x14
Mar 28 15:31:37 nd05 kernel:  [<c0116f3d>] default_wake_function+0x0/0x12
Mar 28 15:31:37 nd05 kernel:  [<f8a286fa>] gfs_glockd+0x0/0xe3 [gfs]
Mar 28 15:31:37 nd05 kernel:  [<c0101ab5>] kernel_thread_helper+0x5/0xb
Mar 28 15:31:37 nd05 kernel: Code: 73 34 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 8b 03 ff 70 18 68 09 e0 9e f8 e8 ac 14 73 c7 83 c4 34 68 11 df
 9e f8 e8 9f 14 73 c7 <0f> 0b 65 01 58 de 9e f8 68 13 df 9e f8 e8 23 0d 73 c7 5b 5e c3 
-->

What problem may be there? 
Thanks for any reply!
Luckey





More information about the Linux-cluster mailing list