[Linux-cluster] Oops
Wagner Ferenc
wferi at niif.hu
Thu May 24 13:51:08 UTC 2007
Hi,
I wasn't sure whether to send this to LKML or here, but DLM seems
involved. Please let me know if I'd better repost it to somewhere
else.
It's a vanilla 2.6.21 kernel patched by cluster-2.00.00 (with the
three extra export for GFS1). Config attached. The machine froze
during the morning updatedb cronjob, which performed a recursive find
into the shared GFS filesystem. Two other nodes doing the same at the
same time are still up.
I experienced a similar hang with cluster-1 not long ago, though that
didn't lock up the whole machine, but the cluster software only.
Please ask back if I didn't provide all information necessary.
clvm: 2.02.26
libdevmapper: 1.02.19
openais: 0.80.2
otherwise stock Debian Etch system.
--
Regards,
Feri.
kernel BUG at kernel/workqueue.c:212!
invalid opcode: 0000 [#1]
SMP
Modules linked in: button ac battery ipv6 gfs lock_nolock lock_dlm gfs2 dlm configfs loop evdev i2c_piix4 pcspkr psmouse rtc serio_raw sworks_agp agpgart i2c_core xfs dm_mirror dm_snapshot ide_generic dm_round_robin dm_emc dm_multipath dm_mod sd_mod ide_disk ata_generic libata serverworks ohci_hcd generic qla2xxx firmware_class scsi_transport_fc scsi_mod usbcore tg3 ide_core thermal processor fan
CPU: 2
EIP: 0060:[<c012f476>] Not tainted VLI
EFLAGS: 00010213 (2.6.21gfs-xeon #2)
EIP is at queue_work+0x2f/0x49
eax: dfb176e4 ebx: 00000002 ecx: f7e66a80 edx: dfb176e0
esi: 00000002 edi: e2bfa080 ebp: 00000000 esp: f7a91bb4
ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
Process dlm_recv/2 (pid: 10261, ti=f7a90000 task=c196aa50 task.ti=f7a90000)
Stack: f798d434 f7c5a980 c026dc79 ab0ee1c1 e2bfa080 dfaea000 f798d434 00200000
00000020 00000000 c1b6bd80 0101e520 e2bfa080 e2bfa080 c0272f90 000000d0
0000000e f7c5a980 00000000 00000039 00000000 00000000 00000000 00000286
Call Trace:
[<c026dc79>] tcp_rcv_established+0x53a/0x7d1
[<c0272f90>] tcp_v4_do_rcv+0x28/0x2c5
[<c0275306>] tcp_v4_rcv+0x81b/0x88d
[<c02957a8>] packet_rcv_spkt+0x0/0x150
[<c024035d>] dev_hard_start_xmit+0x1be/0x21d
[<c025ccef>] ip_local_deliver+0x187/0x230
[<c025cb2f>] ip_rcv+0x409/0x442
[<c02958ed>] packet_rcv_spkt+0x145/0x150
[<c011b434>] __wake_up+0x32/0x43
[<c023ff15>] netif_receive_skb+0x2dc/0x350
[<f8879cfa>] tg3_poll+0x5b6/0x82f [tg3]
[<c0241a00>] net_rx_action+0x9d/0x1a8
[<c012608e>] __do_softirq+0x66/0xcc
[<c0126137>] do_softirq+0x43/0x51
[<c010648f>] do_IRQ+0x5c/0x71
[<c010474b>] common_interrupt+0x23/0x28
[<c0134e03>] down_read_trylock+0x10/0x1d
[<f8c9d90a>] dlm_receive_message+0xa2/0xc0b [dlm]
[<c023870d>] sock_common_recvmsg+0x3e/0x54
[<c02371ff>] sock_recvmsg+0xec/0x107
[<f8c9fe36>] dlm_process_incoming_buffer+0x11a/0x18c [dlm]
[<f8ca3e4c>] receive_from_sock+0x124/0x217 [dlm]
[<c010648f>] do_IRQ+0x5c/0x71
[<f8ca3b4e>] process_recv_sockets+0xf/0x15 [dlm]
[<c012f559>] run_workqueue+0x85/0x125
[<f8ca3b3f>] process_recv_sockets+0x0/0x15 [dlm]
[<c012fde7>] worker_thread+0xf9/0x124
[<c011d23f>] default_wake_function+0x0/0xc
[<c012fcee>] worker_thread+0x0/0x124
[<c013248a>] kthread+0xb2/0xdc
[<c01323d8>] kthread+0x0/0xdc
[<c0104993>] kernel_thread_helper+0x7/0x10
=======================
Code: 64 8b 35 04 00 00 00 f0 0f ba 2a 00 19 c0 31 db 85 c0 75 2c 8d 41 08 39 41 08 8b 1d f4 94 39 c0 0f 45 de 8d 42 04 39 42 04 74 04 <0f> 0b eb fe 8b 01 f7 d0 8b 04 98 e8 34 ff ff ff bb 01 00 00 00
EIP: [<c012f476>] queue_work+0x2f/0x49 SS:ESP 0068:f7a91bb4
Kernel panic - not syncing: Fatal exception in interrupt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.gz
Type: application/octet-stream
Size: 20742 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070524/e29729cd/attachment.obj>
More information about the Linux-cluster
mailing list