[Linux-cluster] Re: Hard lockups during file transfer to GNBD/GFS device

David Brieck Jr. dbrieck at gmail.com
Wed Nov 1 13:50:49 UTC 2006


Well, the problem has gotten even stranger, now a node is mysteriously
crashing with nothing in the logs:

Nov  1 04:02:19 http2 kernel: dlm: http: process_lockqueue_reply id
20260 state 0
Nov  1 04:02:19 http2 kernel: dlm: http: process_lockqueue_reply id
202e2 state 0
Nov  1 04:02:19 http2 kernel: dlm: http: process_lockqueue_reply id
303d7 state 0
Nov  1 04:02:19 http2 kernel: dlm: http: process_lockqueue_reply id
50159 state 0
Nov  1 06:29:19 http2 sshd(pam_unix)[24026]: session opened for user
root by root(uid=0)
Nov  1 06:45:02 http2 syslogd 1.4.1: restart.
Nov  1 06:45:02 http2 syslog: syslogd startup succeeded


Earlier in the day I had this crash on my GNBD server though (might
not be related to my other problem, but hey, who knows), looks like
it's related to DLM:

Oct 31 10:35:55 storage1 gnbd_serv[5073]: server process 25402 exited
because of signal 15
Oct 31 10:35:55 storage1 gnbd_serv[5073]: server process 25400 exited
because of signal 15
Oct 31 10:39:45 storage1 kernel:  rebuilt 1 resources
Oct 31 10:39:45 storage1 kernel: backups rebuilt 98 resources
Oct 31 10:39:45 storage1 kernel: clvmd purge requests
Oct 31 10:39:45 storage1 kernel: backups purge requests
Oct 31 10:39:45 storage1 kernel: clvmd purged 0 requests
Oct 31 10:39:45 storage1 kernel: backups purged 0 requests
Oct 31 10:39:45 storage1 kernel: configs mark waiting requests
Oct 31 10:39:45 storage1 kernel: configs marked 0 requests
Oct 31 10:39:45 storage1 kernel: configs purge locks of departed nodes
Oct 31 10:39:45 storage1 kernel: configs purged 11 locks
Oct 31 10:39:45 storage1 kernel: configs update remastered resources
Oct 31 10:39:45 storage1 kernel: configs updated 1 resources
Oct 31 10:39:45 storage1 kernel: configs rebuild locks
Oct 31 10:39:45 storage1 kernel: configs rebuilt 1 locks
Oct 31 10:39:45 storage1 kernel: configs recover event 230 done
Oct 31 10:39:45 storage1 kernel: configs move flags 0,0,1 ids 229,230,230
Oct 31 10:39:45 storage1 kernel: configs process held requests
Oct 31 10:39:45 storage1 kernel: configs processed 0 requests
Oct 31 10:39:45 storage1 kernel: configs resend marked requests
Oct 31 10:39:45 storage1 kernel: configs resent 0 requests
Oct 31 10:39:45 storage1 kernel: configs recover event 230 finished
Oct 31 10:39:45 storage1 kernel: clvmd mark waiting requests
Oct 31 10:39:45 storage1 kernel: clvmd marked 0 requests
Oct 31 10:39:46 storage1 kernel: clvmd purge locks of departed nodes
Oct 31 10:39:46 storage1 kernel: clvmd purged 5 locks
Oct 31 10:39:46 storage1 kernel: clvmd update remastered resources
Oct 31 10:39:46 storage1 kernel: clvmd updated 0 resources
Oct 31 10:39:46 storage1 kernel: clvmd rebuild locks
Oct 31 10:39:46 storage1 kernel: clvmd rebuilt 0 locks
Oct 31 10:39:46 storage1 kernel: clvmd recover event 230 done
Oct 31 10:39:46 storage1 kernel: Magma mark waiting requests
Oct 31 10:39:46 storage1 kernel: Magma marked 0 requests
Oct 31 10:39:46 storage1 kernel: Magma purge locks of departed nodes
Oct 31 10:39:46 storage1 kernel: Magma purged 0 locks
Oct 31 10:39:46 storage1 kernel: Magma update remastered resources
Oct 31 10:39:46 storage1 kernel: Magma updated 0 resources
Oct 31 10:39:46 storage1 kernel: Magma rebuild locks
Oct 31 10:39:46 storage1 kernel:
Oct 31 10:39:46 storage1 kernel: DLM:  Assertion failed on line 105 of
file /home/buildcentos/rpmbuild/BUILD/dlm-kernel-2.6.9-42/hugemem/src/rebuild.c
Oct 31 10:39:46 storage1 kernel: DLM:  assertion:  "root->res_newlkid_expect"
Oct 31 10:39:46 storage1 kernel: DLM:  time = 2164169409
Oct 31 10:39:46 storage1 kernel: newlkid_expect=0
Oct 31 10:39:46 storage1 kernel:
Oct 31 10:39:46 storage1 kernel: ------------[ cut here ]------------
Oct 31 10:39:46 storage1 kernel: kernel BUG at
/home/buildcentos/rpmbuild/BUILD/dlm-kernel-2.6.9-42/hugemem/src/rebuild.c:105!
Oct 31 10:39:46 storage1 kernel: invalid operand: 0000 [#1]
Oct 31 10:39:46 storage1 kernel: SMP
Oct 31 10:39:46 storage1 kernel: Modules linked in: ip_vs_wlc ip_vs
lock_dlm(U) gfs(U) lock_harness(U) mptctl mptbase dell_rbu parport_pc
lp parport autofs4 i2c_dev i2c_core gnbd(U) dlm(U) cman(U) sunrpc
ipmi_devintf ipmi_si ipmi_msghandler iptable_filter ip_tables md5 ipv6
dm_mirror joydev button battery ac uhci_hcd ehci_hcd hw_random shpchp
e1000 bonding(U) floppy sg ext3 jbd dm_mod megaraid_mbox megaraid_mm
sd_mod scsi_mod
Oct 31 10:39:46 storage1 kernel: CPU:    0
Oct 31 10:39:46 storage1 kernel: EIP:    0060:[<f8a2cfcd>]    Not tainted VLI
Oct 31 10:39:46 storage1 kernel: EFLAGS: 00010246   (2.6.9-42.0.2.ELhugemem)
Oct 31 10:39:46 storage1 kernel: EIP is at have_new_lkid+0x79/0xb7 [dlm]
Oct 31 10:39:46 storage1 kernel: eax: 00000001   ebx: dd76a0ec   ecx:
e1069e3c   edx: f8a340dd
Oct 31 10:39:46 storage1 kernel: esi: dd76a150   edi: 009803dc   ebp:
39f2e400   esp: e1069e38
Oct 31 10:39:46 storage1 kernel: ds: 007b   es: 007b   ss: 0068
Oct 31 10:39:46 storage1 kernel: Process dlm_recvd (pid: 4314,
threadinfo=e1069000 task=e13c1630)
Oct 31 10:39:46 storage1 kernel: Stack: f8a340dd f8a34136 00000000
f8a34086 00000069 f8a3403b f8a3411d 80fe9ac1
Oct 31 10:39:46 storage1 kernel:        000002e8 00060028 f8a2e46b
6b914018 00000001 00000020 6b914000 39f2e400
Oct 31 10:39:46 storage1 kernel:        00000001 6b914000 f8a2e9f6
000002e8 00004040 00001000 de541580 00000001
Oct 31 10:39:46 storage1 kernel: Call Trace:
Oct 31 10:39:46 storage1 kernel:  [<f8a2e46b>]
rebuild_rsbs_lkids_recv+0x99/0x106 [dlm]
Oct 31 10:39:46 storage1 kernel:  [<f8a2e9f6>]
rcom_process_message+0x2e8/0x405 [dlm]
Oct 31 10:39:46 storage1 kernel:  [<f8a2ecfd>]
process_recovery_comm+0x3c/0xa7 [dlm]
Oct 31 10:39:46 storage1 kernel:  [<f8a2ab8b>]
midcomms_process_incoming_buffer+0x1bc/0x1f8 [dlm]
Oct 31 10:39:46 storage1 kernel:  [<02142d40>] buffered_rmqueue+0x17d/0x1a5
Oct 31 10:39:46 storage1 kernel:  [<021204e9>] autoremove_wake_function+0x0/0x2d
Oct 31 10:39:46 storage1 kernel:  [<02142e1c>] __alloc_pages+0xb4/0x29d
Oct 31 10:39:46 storage1 kernel:  [<f8a28e01>]
receive_from_sock+0x192/0x26c [dlm]
Oct 31 10:39:46 storage1 kernel:  [<f8a29cc9>] dlm_recvd+0x0/0x95 [dlm]
Oct 31 10:39:46 storage1 kernel:  [<f8a29b73>] process_sockets+0x56/0x91 [dlm]
Oct 31 10:39:46 storage1 kernel:  [<f8a29d4e>] dlm_recvd+0x85/0x95 [dlm]
Oct 31 10:39:46 storage1 kernel:  [<02133089>] kthread+0x73/0x9b
Oct 31 10:39:46 storage1 kernel:  [<02133016>] kthread+0x0/0x9b
Oct 31 10:39:46 storage1 kernel:  [<021041f5>] kernel_thread_helper+0x5/0xb
Oct 31 10:39:46 storage1 kernel: Code: 41 a3 f8 68 3b 40 a3 f8 6a 69
68 86 40 a3 f8 e8 17 59 6f 09 ff 73 60 68 36 41 a3 f8 e8 0a 59 6f 09
68 dd 40 a3 f8 e8 00 59 6f 09 <0f> 0b 69 00 3b 40 a3 f8 83 c4 20 68 df
40 a3 f8 e8 55 50 6f 09
Oct 31 10:39:46 storage1 kernel:  <0>Fatal exception: panic in 5 seconds




More information about the Linux-cluster mailing list