[Linux-cluster] Kernel Crashes on all nodes when one dies
isplist at logicore.net
isplist at logicore.net
Mon Apr 23 16:12:47 UTC 2007
Someone accidentally messed up iptables on a node so that it could no longer
communicate with the cluster. That should have been the end of the problem,
one node down but instead, all nodes died with a kernel crash.
Here is a paste from one of the log's. I think this is the right section which
shows the dying nodes;
Mike
Apr 22 11:55:26 qm250 kernel: qm move flags 0,1,0 ids 0,3,0
Apr 22 11:55:26 qm250 kernel: qm move use event 3
Apr 22 11:55:26 qm250 kernel: qm recover event 3 (first)
Apr 22 11:55:26 qm250 kernel: qm add nodes
Apr 22 11:55:26 qm250 kernel: qm total nodes 2
Apr 22 11:55:26 qm250 kernel: qm rebuild resource directory
Apr 22 11:55:26 qm250 kernel: qm rebuilt 8 resources
Apr 22 11:55:26 qm250 kernel: qm recover event 3 done
Apr 22 11:55:26 qm250 kernel: qm move flags 0,0,1 ids 0,3,3
Apr 22 11:55:26 qm250 kernel: qm process held requests
Apr 22 11:55:26 qm250 kernel: qm processed 0 requests
Apr 22 11:55:26 qm250 kernel: qm recover event 3 finished
Apr 22 11:55:26 qm250 kernel: clvmd move flags 1,0,0 ids 2,2,2
Apr 22 11:55:26 qm250 kernel: qm move flags 1,0,0 ids 3,3,3
Apr 22 11:55:26 qm250 kernel: 2640 pr_start last_stop 0 last_start 4
last_finish 0
Apr 22 11:55:26 qm250 kernel: 2640 pr_start count 2 type 2 event 4 flags 250
Apr 22 11:55:26 qm250 kernel: 2640 claim_jid 1
Apr 22 11:55:26 qm250 kernel: 2640 pr_start 4 done 1
Apr 22 11:55:26 qm250 kernel: 2640 pr_finish flags 5a
Apr 22 11:55:27 qm250 kernel: 2566 recovery_done jid 1 msg 309 a
Apr 22 11:55:27 qm250 kernel: 2566 recovery_done nodeid 250 flg 18
Apr 22 11:55:27 qm250 kernel:
Apr 22 11:55:27 qm250 kernel: lock_dlm: Assertion failed on line 357 of file
/home/buildcentos/rpmbuild/BUILD/gf
s-kernel-2.6.9-60/up/src/dlm/lock.c
Apr 22 11:55:27 qm250 kernel: lock_dlm: assertion: "!error"
Apr 22 11:55:27 qm250 kernel: lock_dlm: time = 14525882
Apr 22 11:55:27 qm250 kernel: qm: error=-22 num=2,1a lkf=10000 flags=84
Apr 22 11:55:27 qm250 kernel:
Apr 22 11:55:27 qm250 kernel: ------------[ cut here ]------------
Apr 22 11:55:27 qm250 kernel: kernel BUG at
/home/buildcentos/rpmbuild/BUILD/gfs-kernel-2.6.9-60/up/src/dlm/lock.
c:357!
Apr 22 11:55:27 qm250 kernel: invalid operand: 0000 [#1]
Apr 22 11:55:27 qm250 kernel: Modules linked in: lock_dlm(U) gfs(U)
lock_harness(U) parport_pc lp parport autofs4
dlm(U) cman(U) md5 ipv6 sunrpc dm_mirror dm_mod uhci_hcd e100 mii floppy ext3
jbd qla2200 qla2xxx scsi_transport
_fc sd_mod scsi_mod
Apr 22 11:55:27 qm250 kernel: CPU: 0
Apr 22 11:55:27 qm250 kernel: EIP: 0060:[<e09aacfe>] Not tainted VLI
Apr 22 11:55:27 qm250 kernel: EFLAGS: 00010246 (2.6.9-42.0.3.EL)
Apr 22 11:55:27 qm250 kernel: EIP is at do_dlm_unlock+0x89/0x9e [lock_dlm]
Apr 22 11:55:27 qm250 kernel: eax: 00000001 ebx: dfd552e0 ecx: e09b089f
edx: dafe9f44
Apr 22 11:55:27 qm250 kernel: esi: ffffffea edi: dfd552e0 ebp: e0a62000
esp: dafe9f40
Apr 22 11:55:27 qm250 kernel: ds: 007b es: 007b ss: 0068
Apr 22 11:55:27 qm250 kernel: Process gfs_glockd (pid: 2647,
threadinfo=dafe9000 task=de442c50)
Apr 22 11:55:27 qm250 kernel: Stack: e09b089f e0a62000 00000003 e09aafff
e0ae3e51 dfd7d4ac e0a62000 e0b156c0
Apr 22 11:55:27 qm250 kernel: e0ad6bd4 dfd7d4ac e0b156c0 dafe9fb4
e0ad5683 dfd7d4ac 00000001 e0ad5840
Apr 22 11:55:27 qm250 kernel: dfd7d4ac dfd7d4ac e0ad5af9 dfd7d550
e0ad9182 dafe9000 dafe9fc0 e0ac8e9a
Apr 22 11:55:27 qm250 kernel: Call Trace:
Apr 22 11:55:27 qm250 kernel: [<e09aafff>] lm_dlm_unlock+0x13/0x1b [lock_dlm]
Apr 22 11:55:27 qm250 kernel: [<e0ae3e51>] gfs_lm_unlock+0x2b/0x40 [gfs]
Apr 22 11:55:27 qm250 kernel: [<e0ad6bd4>] gfs_glock_drop_th+0x17a/0x1b0
[gfs]
Apr 22 11:55:27 qm250 kernel: [<e0ad5683>] rq_demote+0x15c/0x1da [gfs]
Apr 22 11:55:27 qm250 kernel: [<e0ad5840>] run_queue+0x5a/0xc1 [gfs]
Apr 22 11:55:27 qm250 kernel: [<e0ad5af9>] unlock_on_glock+0x6e/0xc8 [gfs]
Apr 22 11:55:27 qm250 kernel: [<e0ad9182>] gfs_reclaim_glock+0x257/0x2ae
[gfs]
Apr 22 11:55:27 qm250 kernel: [<e0ac8e9a>] gfs_glockd+0x38/0xde [gfs]
Apr 22 11:55:27 qm250 kernel: [<c0120049>] default_wake_function+0x0/0xc
Apr 22 11:55:27 qm250 kernel: [<c0318d7e>] ret_from_fork+0x6/0x14
Apr 22 11:55:27 qm250 kernel: [<c0120049>] default_wake_function+0x0/0xc
Apr 22 11:55:28 qm250 kernel: [<e0ac8e62>] gfs_glockd+0x0/0xde [gfs]
Apr 22 11:55:28 qm250 kernel: [<c01041dd>] kernel_thread_helper+0x5/0xb
Apr 22 11:55:28 qm250 kernel: Code: 73 34 8b 03 ff 73 2c ff 73 08 ff 73 04 ff
73 0c 56 ff 70 18 68 ac 09 9b e0 e8
10 9c 77 df 83 c4 34 68 9f 08 9b e0 e8 03 9c 77 df <0f> 0b 65 01 2e 07 9b e0
68 a1 08 9b e0 e8 5b 90 77 df 5b 5e
c3
Apr 22 11:55:28 qm250 kernel: <0>Fatal exception: panic in 5 seconds
More information about the Linux-cluster
mailing list