[Linux-cluster] dlm caused a kernel panic

Jeff Dinisco jeff at jettis.com
Wed Dec 14 02:18:15 UTC 2005


I'm running FC4 (2.6.13-1.1532_FC4smp), dlm-1.0.0-3 and GFS-6.1.0-3.  I
have a 3 node cluster.  The df command has always been very slow to
return output on my gfs mounted filesystems.  Series of events...

16:20:00 - node01 was out of the cluster, node02 and node03 were active
with 2 gfs filesystems mounted
16:22:10 - after joining the cluster, both filesystems were successfully
mounted
16:22:37 - a df command was attempted by a monitoring script 
16:22:54 - I executed /etc/init.d/gfs stop and it failed because 1 of
the filesystems was busy and could not be umounted (the above df command
may have been the cause, it ended up hanging)
16:22:55 - node02 and node03 panicked and were not properly fenced

log messages from node02 at the time of the panic

Dec 13 16:22:55 node02 kernel:  event 22 done
Dec 13 16:22:55 node02 kernel: gfs01 move flags 0,0,1 ids 15,22,22
Dec 13 16:22:55 node02 kernel: gfs01 process held requests
Dec 13 16:22:55 node02 kernel: gfs01 processed 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 resend marked requests
Dec 13 16:22:55 node02 kernel: gfs01 resent 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 recover event 22 finished
Dec 13 16:22:55 node02 kernel: gfs00 move flags 1,0,0 ids 20,20,20
Dec 13 16:22:55 node02 kernel: gfs00 move flags 0,1,0 ids 20,25,20
Dec 13 16:22:55 node02 kernel: gfs00 move use event 25
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25
Dec 13 16:22:55 node02 kernel: gfs00 remove node 1
Dec 13 16:22:55 node02 kernel: gfs00 total nodes 2
Dec 13 16:22:55 node02 kernel:  event 22 done
Dec 13 16:22:55 node02 kernel: gfs01 move flags 0,0,1 ids 15,22,22
Dec 13 16:22:55 node02 kernel: gfs01 process held requests
Dec 13 16:22:55 node02 kernel: gfs01 processed 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 resend marked requests
Dec 13 16:22:55 node02 kernel: gfs01 resent 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 recover event 22 finished
Dec 13 16:22:55 node02 kernel: gfs00 move flags 1,0,0 ids 20,20,20
Dec 13 16:22:55 node02 kernel: gfs00 move flags 0,1,0 ids 20,25,20
Dec 13 16:22:55 node02 kernel: gfs00 move use event 25
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25
Dec 13 16:22:55 node02 kernel: gfs00 remove node 1
Dec 13 16:22:55 node02 kernel: gfs00 total nodes 2
Dec 13 16:22:55 node02 kernel: gfs00 rebuild resource directory
Dec 13 16:22:55 node02 kernel: gfs00 rebuilt 1913 resources
Dec 13 16:22:55 node02 kernel:  event 22 done
Dec 13 16:22:55 node02 kernel: gfs01 move flags 0,0,1 ids 15,22,22
Dec 13 16:22:55 node02 kernel: gfs01 process held requests
Dec 13 16:22:55 node02 kernel: gfs01 processed 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 resend marked requests
Dec 13 16:22:55 node02 kernel: gfs01 resent 0 requests
Dec 13 16:22:55 node02 kernel: gfs01 recover event 22 finished
Dec 13 16:22:55 node02 kernel: gfs00 move flags 1,0,0 ids 20,20,20
Dec 13 16:22:55 node02 kernel: gfs00 move flags 0,1,0 ids 20,25,20
Dec 13 16:22:55 node02 kernel: gfs00 move use event 25
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25
Dec 13 16:22:55 node02 kernel: gfs00 remove node 1
Dec 13 16:22:55 node02 kernel: gfs00 total nodes 2
Dec 13 16:22:55 node02 kernel: gfs00 rebuild resource directory
Dec 13 16:22:55 node02 kernel: gfs00 rebuilt 1913 resources
Dec 13 16:22:55 node02 kernel: gfs00 purge requests
Dec 13 16:22:55 node02 kernel: gfs00 purged 0 requests
Dec 13 16:22:55 node02 kernel: gfs00 mark waiting requests
Dec 13 16:22:55 node02 kernel: gfs00 mark 2900192 lq 4 nodeid 1
Dec 13 16:22:55 node02 kernel: gfs00 mark 2900192 unlock no rep
Dec 13 16:22:55 node02 kernel: gfs00 marked 1 requests
Dec 13 16:22:55 node02 kernel: gfs00 purge locks of departed nodes
Dec 13 16:22:55 node02 kernel: gfs00 purged 1 locks
Dec 13 16:22:55 node02 kernel: gfs00 update remastered resources
Dec 13 16:22:55 node02 kernel: gfs00 updated 1 resources
Dec 13 16:22:55 node02 kernel: gfs00 rebuild locks
Dec 13 16:22:55 node02 kernel: gfs00 rebuilt 0 locks
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25 done
Dec 13 16:22:55 node02 kernel: gfs00 move flags 0,0,1 ids 20,25,25
Dec 13 16:22:55 node02 kernel: gfs00 process held requests
Dec 13 16:22:55 node02 kernel: gfs00 processed 0 requests
Dec 13 16:22:55 node02 kernel: gfs00 resend marked requests
Dec 13 16:22:55 node02 kernel: gfs00 resend 2900192 lq 4 flg 3080000
node 2/2 "withdraw 1"
Dec 13 16:22:55 node02 kernel: gfs00 unlock done 2900192
Dec 13 16:22:55 node02 kernel: gfs00 resent 1 requests
Dec 13 16:22:55 node02 kernel: gfs00 recover event 25 finished
Dec 13 16:22:55 node02 kernel:
Dec 13 16:22:55 node02 kernel: DLM:  Assertion failed on line 1007 of
file /usr/src/build/627959-i686/BUILD/smp/src/lockqueue.c
Dec 13 16:22:55 node02 kernel: DLM:  assertion:  "lkb"
Dec 13 16:22:56 node02 kernel: DLM:  time = 6642223
Dec 13 16:22:56 node02 kernel: dlm: reply
Dec 13 16:22:56 node02 kernel: rh_cmd 5
Dec 13 16:22:56 node02 kernel: rh_lkid 2900192
Dec 13 16:22:56 node02 kernel: lockstate 4137259392
Dec 13 16:22:56 node02 kernel: nodeid 3224043367
Dec 13 16:22:56 node02 kernel: status 4294901758
Dec 13 16:22:56 node02 kernel: lkid 4040
Dec 13 16:22:56 node02 kernel: nodeid 1
Dec 13 16:22:56 node02 kernel:
Dec 13 16:22:56 node02 kernel: ------------[ cut here ]------------
Dec 13 16:22:56 node02 kernel: kernel BUG at
/usr/src/build/627959-i686/BUILD/smp/src/lockqueue.c:1007!
Dec 13 16:22:56 node02 kernel: invalid operand: 0000 [#1]
Dec 13 16:22:56 node02 kernel: SMP
Dec 13 16:22:56 node02 kernel: Modules linked in: autofs4 i2c_dev
i2c_core lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) ipv6 crc32c
libcrc32c iscsi_sfnet(U) scsi_transport_iscsi dm_mod video button
battery ac uhci_hcd ehci_hcd shpchp e100 mii e1000 floppy sg ext3 jbd
megaraid_mbox megaraid_mm sd_mod scsi_mod
Dec 13 16:22:56 node02 kernel: CPU:    3
Dec 13 16:22:56 node02 kernel: EIP:    0060:[<f8b66d09>]    Tainted: GF
VLI
Dec 13 16:22:56 node02 kernel: EFLAGS: 00010292   (2.6.13-1.1532_FC4smp)
Dec 13 16:22:56 node02 kernel: EIP is at
process_cluster_request+0x9b9/0xdfa [dlm]
Dec 13 16:22:56 node02 kernel: eax: 00000004   ebx: 00000000   ecx:
c036fc2c   edx: 00000286
Dec 13 16:22:56 node02 kernel: esi: f6d35200   edi: 00000000   ebp:
f6035ed4   esp: f6035e24
Dec 13 16:22:56 node02 kernel: ds: 007b   es: 007b   ss: 0068
Dec 13 16:22:56 node02 kernel: Process dlm_recvd (pid: 2939,
threadinfo=f6035000 task=f7916020)
Dec 13 16:22:56 node02 kernel: Stack: badc0ded f8b73a44 00000001
f8b74a9c f8b73a40 00655a2f 00000001 00000040
Dec 13 16:22:56 node02 kernel:        00004000 f6035e48 00000000
c039f100 00001000 f3c4bb80 c02aff67 00001000
Dec 13 16:22:56 node02 kernel:        00004040 00000000 f8b6f617
00000000 00000001 ffffffff 00000000 f7ef84bc
Dec 13 16:22:56 node02 kernel: Call Trace:
Dec 13 16:22:56 node02 kernel:  [<c02aff67>] sock_recvmsg+0x103/0x11e
Dec 13 16:22:56 node02 kernel:  [<f8b6f617>]
process_reply_async+0x1d/0x23 [dlm]
Dec 13 16:22:56 node02 kernel:  [<f8b6a6d1>] copy_from_cb+0x25/0x5d
[dlm]
Dec 13 16:22:56 node02 kernel:  [<f8b6a95b>]
midcomms_process_incoming_buffer+0x13b/0x25f [dlm]
Dec 13 16:22:56 node02 kernel:  [<c02aff67>] sock_recvmsg+0x103/0x11e
Dec 13 16:22:56 node02 kernel:  [<f8b6880f>]
receive_from_sock+0x19b/0x2ce [dlm]
Dec 13 16:22:56 node02 kernel:  [<c03166e3>] schedule+0x563/0xb8e
Dec 13 16:22:56 node02 kernel:  [<c0105f15>] do_IRQ+0x55/0x86
Dec 13 16:22:56 node02 kernel:  [<f8b69949>] dlm_recvd+0x0/0xa1 [dlm]
Dec 13 16:22:56 node02 kernel:  [<f8b69777>] process_sockets+0x80/0xda
[dlm]
Dec 13 16:22:56 node02 kernel:  [<f8b699b9>] dlm_recvd+0x70/0xa1 [dlm]
Dec 13 16:22:56 node02 kernel:  [<c01343d9>] kthread+0x93/0x97
Dec 13 16:22:56 node02 kernel:  [<c0134346>] kthread+0x0/0x97
Dec 13 16:22:56 node02 kernel:  [<c0101ca1>]
kernel_thread_helper+0x5/0xb
Dec 13 16:22:56 node02 kernel: Code: 65 a9 5b c7 89 e8 e8 6a bd 00 00 8b
54 24 14 89 54 24 04 c7 04 24 96 3b b7 f8 e8 4a a9 5b c7 c7 04 24 44 3a
b7 f8 e8 3e a9 5b c7 <0f> 0b ef 03 9c 4a b7 f8 c7 04 24 2c 4b b7 f8 e8
76 9f 5b c7 e8
Dec 13 16:22:56 node02 kernel:  <0>Fatal exception: panic in 5 seconds

Any help would be greatly appreciated.

 - Jeff


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20051213/997fc6a0/attachment.htm>


More information about the Linux-cluster mailing list