[Linux-cluster] Occasional kernel panics

Ethan Sommer sommere at gac.edu
Fri Oct 21 16:38:13 UTC 2005


Every few days or so our cluster machines seem to have kernel panics 
comp laing about GFS locking (although its pretty irregular, we went for 
a few weeks without an outage)

We noticed that this happened a LOT, and it was reproducible when 
certain users accessed files, when we were serving afp off the cluster. 
We have changed things since then so that afp is run on a server which 
nfs mounts the cluster.

We are running FC4 with the gfs modules from yum.


Here is our most recent kernel panics, followed by one from when we had 
afp running on the cluster: (it looks like there is relevant info above 
the cut-here, possibly if it might be helpful)



Oct 19 14:44:41 meow kernel: ------------[ cut here ]------------
Oct 19 14:44:41 meow kernel: kernel BUG at 
/usr/src/build/607755-i686/BUILD/smp/src/lockqueue.c:1144!
Oct 19 14:44:41 meow kernel: invalid operand: 0000 [#1]
Oct 19 14:44:41 meow kernel: SMP
Oct 19 14:44:41 meow kernel: Modules linked in: nfsd exportfs lockd 
autofs4 lock_dlm(U) gfs(U) lock_harness(U) rfcomm l2cap bluetooth dlm(U) 
cman(U) md5 ip
v6 sunrpc ipt_LOG ipt_limit ipt_state ip_conntrack iptable_filter 
ip_tables video button battery ac uhci_hcd ehci_hcd hw_random i2c_i801 
i2c_core shpchp e1000 floppy ext3 jbd raid1 dm_mod qla2200 qla2xxx 
scsi_transport_fc ata_piix libata sd_mod scsi_mod
Oct 19 14:44:41 meow kernel: CPU:    1
Oct 19 14:44:41 meow kernel: EIP:    0060:[<f8af9dcf>]    Not tainted VLI
Oct 19 14:44:41 meow kernel: EFLAGS: 00010292   (2.6.12-1.1447_FC4smp)
Oct 19 14:44:41 meow kernel: EIP is at 
process_cluster_request+0xddb/0xdef [dlm]
Oct 19 14:44:41 meow kernel: eax: 00000004   ebx: 00000000   ecx: 
c035fa4c   edx: 00000286
Oct 19 14:44:41 meow kernel: esi: f7fb8400   edi: 00000000   ebp: 
d2988000   esp: f7eefe24
Oct 19 14:44:41 meow kernel: ds: 007b   es: 007b   ss: 0068
Oct 19 14:44:41 meow kernel: Process dlm_recvd (pid: 2402, 
threadinfo=f7eef000 task=f7851020)
Oct 19 14:44:41 meow kernel: Stack: f8b0621b 00000001 f8b071e0 f8b06217 
2583f987 00000001 00000040 00004000
Oct 19 14:44:41 meow kernel:        f7eefe48 00000000 c038e1a0 00000a58 
f0167b00 c02a26c1 00000a58 00004040
Oct 19 14:44:41 meow kernel:        00000072 f7eefed4 00000000 00000001 
00000246 00000000 edd6eeb8 00000000
Oct 19 14:44:41 meow kernel: Call Trace:
Oct 19 14:44:41 meow kernel:  [<c02a26c1>] sock_recvmsg+0x103/0x11e
Oct 19 14:44:41 meow kernel:  [<f8afd46b>] 
midcomms_process_incoming_buffer+0x13b/0x25f [dlm]
Oct 19 14:44:41 meow kernel:  [<c011ce54>] load_balance_newidle+0x23/0x82
Oct 19 14:44:41 meow kernel:  [<f8afb3d3>] receive_from_sock+0x196/0x2c9 
[dlm]
Oct 19 14:44:41 meow kernel:  [<c0307705>] schedule+0x405/0xc5e
Oct 19 14:44:41 meow kernel:  [<c0307731>] schedule+0x431/0xc5e
Oct 19 14:44:41 meow kernel:  [<f8afc457>] dlm_recvd+0x0/0x9c [dlm]
Oct 19 14:44:41 meow kernel:  [<f8afc2d3>] process_sockets+0x75/0xb7 [dlm]
Oct 19 14:44:41 meow kernel:  [<f8afc4c7>] dlm_recvd+0x70/0x9c [dlm]
Oct 19 14:44:41 meow kernel:  [<c0134c09>] kthread+0x93/0x97
Oct 19 14:44:41 meow kernel:  [<c0134b76>] kthread+0x0/0x97
Oct 19 14:44:41 meow kernel:  [<c01023d1>] kernel_thread_helper+0x5/0xb
Oct 19 14:44:41 meow kernel: Code: 4f 82 62 c7 89 e8 e8 b1 b4 00 00 8b 
4c 24 14 89 4c 24 04 c7 04 24 6d 63 b0 f8 e8 34 82 62 c7 c7 04 24 1b 62 
b0 f8 e8 28
82 62 c7 <0f> 0b 78 04 e0 71 b0 f8 c7 04 24 70 72 b0 f8 e8 40 78 62 c7 57
Oct 19 14:44:41 meow kernel:  <0>Fatal exception: panic in 5 seconds


Panic 2:

Oct 10 09:58:39 woof kernel: ------------[ cut here ]------------
Oct 10 09:58:39 woof kernel: kernel BUG at 
/usr/src/build/607778-i686/BUILD/smp/src/dlm/lock.c:411!
Oct 10 09:58:39 woof kernel: invalid operand: 0000 [#1]
Oct 10 09:58:39 woof kernel: SMP
Oct 10 09:58:39 woof kernel: Modules linked in: nfsd exportfs lockd 
autofs4 lock_dlm(U) gfs(U) lock_harness(U) rfcomm l2cap bluetooth dlm(U) 
cman(U) md5 ip
v6 sunrpc ipt_LOG ipt_limit ipt_state ip_conntrack iptable_filter 
ip_tables video button battery ac uhci_hcd ehci_hcd hw_random i2c_i801 
i2c_core shpchp e1
000 dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod qla2200 qla2xxx 
scsi_transport_fc ata_piix libata sd_mod scsi_mod
Oct 10 09:58:39 woof kernel: CPU:    1
Oct 10 09:58:39 woof kernel: EIP:    0060:[<f8b98bf5>]    Not tainted VLI
Oct 10 09:58:39 woof kernel: EFLAGS: 00010292   (2.6.12-1.1447_FC4smp)
Oct 10 09:58:39 woof kernel: EIP is at do_dlm_lock+0x1b7/0x21d [lock_dlm]
Oct 10 09:58:39 woof kernel: eax: 00000004   ebx: 00000000   ecx: 
c035fa4c   edx: 00000292
Oct 10 09:58:39 woof kernel: esi: f7848140   edi: ffffffea   ebp: 
00000003   esp: c74b3cfc
Oct 10 09:58:39 woof kernel: ds: 007b   es: 007b   ss: 0068
Oct 10 09:58:39 woof kernel: Process imapd (pid: 24278, 
threadinfo=c74b3000 task=f4721a80)
Oct 10 09:58:39 woof kernel: Stack: f8b9de75 f7848140 00000003 1bbe0000 
00000000 ffffffea 00000003 00000005
Oct 10 09:58:39 woof kernel:        0000000d 00000005 00000000 f58c0a00 
00000001 0000000d 20200000 20202020
Oct 10 09:58:39 woof kernel:        20203320 20202020 62312020 30306562 
00183030 c8fb2f00 00000001 00000001
Oct 10 09:58:39 woof kernel: Call Trace:
Oct 10 09:58:39 woof kernel:  [<f8b98cff>] lm_dlm_lock+0x52/0x5e [lock_dlm]
Oct 10 09:58:39 woof kernel:  [<f8b98cad>] lm_dlm_lock+0x0/0x5e [lock_dlm]
Oct 10 09:58:39 woof kernel:  [<f8bd000c>] gfs_lm_lock+0x3d/0x5c [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bc5039>] gfs_glock_xmote_th+0xae/0x1d3 
[gfs]
Oct 10 09:58:39 woof kernel:  [<f8bc463c>] rq_promote+0x126/0x150 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bc4840>] run_queue+0xee/0x113 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bc5af1>] gfs_glock_nq+0x93/0x144 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bc619d>] gfs_glock_nq_init+0x18/0x2d [gfs]
Oct 10 09:58:39 woof kernel:  [<f8be3926>] get_local_rgrp+0xca/0x1b0 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8be3a9c>] 
gfs_inplace_reserve_i+0x90/0xd0 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8be046b>] gfs_quota_lock_m+0xbf/0x117 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bd8a2b>] do_do_write_buf+0x3a1/0x485 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bc56a1>] 
glock_wait_internal+0x16b/0x26a [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bd8c91>] do_write_buf+0x182/0x1b6 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bd7be5>] walk_vm+0xb3/0x111 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bd8d65>] gfs_write+0xa0/0xc2 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bd8b0f>] do_write_buf+0x0/0x1b6 [gfs]
Oct 10 09:58:39 woof kernel:  [<f8bd8cc5>] gfs_write+0x0/0xc2 [gfs]
Oct 10 09:58:39 woof kernel:  [<c0162987>] vfs_write+0x9e/0x110
Oct 10 09:58:39 woof kernel:  [<c0162aa4>] sys_write+0x41/0x6a
Oct 10 09:58:39 woof kernel:  [<c0104035>] syscall_call+0x7/0xb
Oct 10 09:58:39 woof kernel: Code: 7c 24 14 89 4c 24 0c 89 5c 24 10 89 
6c 24 08 89 74 24 04 c7 04 24 28 e6 b9 f8 e8 0e 94 58 c7 c7 04 24 75 de 
b9 f8 e8 02
94 58 c7 <0f> 0b 9b 01 a0 e4 b9 f8 c7 04 24 3c e5 b9 f8 e8 1a 8a 58 c7 66
Oct 10 09:58:39 woof kernel:  <0>Fatal exception: panic in 5 seconds




Sep  7 15:37:44 meow kernel: ------------[ cut here ]------------
Sep  7 15:37:44 meow kernel: kernel BUG at 
/usr/src/build/588748-i686/BUILD/smp/src/dlm/plock.c:500!
Sep  7 15:37:44 meow kernel: invalid operand: 0000 [#1]
Sep  7 15:37:44 meow kernel: SMP
Sep  7 15:37:44 meow kernel: Modules linked in: appletalk nfsd exportfs 
lockd autofs4 lock_dlm(U) gfs(U) lock_harness(U) rfcomm l2cap bluetooth 
dlm(U) cman
(U) sunrpc md5 ipv6 ipt_LOG ipt_limit ipt_state ip_conntrack 
iptable_filter ip_tables video button battery ac uhci_hcd ehci_hcd 
hw_random i2c_i801 i2c_core
 shpchp e1000 floppy ext3 jbd raid1 dm_mod qla2200 qla2xxx 
scsi_transport_fc ata_piix libata sd_mod scsi_mod
Sep  7 15:37:44 meow kernel: CPU:    3
Sep  7 15:37:44 meow kernel: EIP:    0060:[<f8b9a3f7>]    Tainted: 
GF     VLI
Sep  7 15:37:44 meow kernel: EFLAGS: 00010292   (2.6.12-1.1398_FC4smp)
Sep  7 15:37:44 meow kernel: EIP is at update_lock+0x87/0x9b [lock_dlm]
Sep  7 15:37:44 meow kernel: eax: 00000004   ebx: fffffff5   ecx: 
c035ca4c   edx: 00000282
Sep  7 15:37:44 meow kernel: esi: 00000000   edi: e99c2c00   ebp: 
00000000   esp: d05dedb4
Sep  7 15:37:44 meow kernel: ds: 007b   es: 007b   ss: 0068
Sep  7 15:37:44 meow kernel: Process afpd (pid: 3872, 
threadinfo=d05de000 task=d6447550)
Sep  7 15:37:44 meow kernel: Stack: badc0ded f8b9d0d6 fffffff5 f8b9da70 
f8b9d101 06609291 f7943000 00000000
Sep  7 15:37:44 meow kernel:        f8b9a499 7ffffff8 00000000 7ffffff8 
00000000 d05dede8 d7636700 7ffffff8
Sep  7 15:37:44 meow kernel:        00000000 d05deea8 d05dee28 f8b9a987 
00000001 7ffffff8 00000000 7ffffff8
Sep  7 15:37:44 meow kernel: Call Trace:
Sep  7 15:37:44 meow kernel:  [<f8b9a499>] add_lock+0x8e/0xed [lock_dlm]
Sep  7 15:37:44 meow kernel:  [<f8b9a987>] fill_gaps+0x87/0x10e [lock_dlm]
Sep  7 15:37:44 meow kernel:  [<f8b9aa51>] lock_case3+0x43/0xac [lock_dlm]
Sep  7 15:37:44 meow kernel:  [<f8b9aeac>] plock_internal+0x1aa/0x370 
[lock_dlm]
Sep  7 15:37:44 meow kernel:  [<f8b9b614>] lm_dlm_plock+0x25b/0x2dc 
[lock_dlm]
Sep  7 15:37:44 meow kernel:  [<f8b9b3b9>] lm_dlm_plock+0x0/0x2dc [lock_dlm]
Sep  7 15:37:44 meow kernel:  [<f8bdc1c3>] gfs_lm_plock+0x45/0x57 [gfs]
Sep  7 15:37:44 meow kernel:  [<f8be5731>] gfs_lock+0xcd/0x11c [gfs]
Sep  7 15:37:44 meow kernel:  [<f8be5664>] gfs_lock+0x0/0x11c [gfs]
Sep  7 15:37:44 meow kernel:  [<c0176c4f>] fcntl_setlk64+0x16c/0x26a
Sep  7 15:37:44 meow kernel:  [<c0162e93>] fget+0x3b/0x42
Sep  7 15:37:44 meow kernel:  [<c0172bfd>] sys_fcntl64+0x55/0x97
Sep  7 15:37:44 meow kernel:  [<c0104025>] syscall_call+0x7/0xb
Sep  7 15:37:44 meow kernel: Code: 01 00 00 c7 04 24 a8 da b9 f8 e8 7c 
77 58 c7 89 5c 24 04 c7 04 24 08 d1 b9 f8 e8 6c 77 58 c7 c7 04 24 d6 d0 
b9 f8 e8 60
77 58 c7 <0f> 0b f4 01 70 da b9 f8 c7 04 24 10 db b9 f8 e8 78 6d 58 c7 55
Sep  7 15:37:44 meow kernel:  <0>Fatal exception: panic in 5 seconds


Thanks for any help,
  Ethan







More information about the Linux-cluster mailing list