[Linux-cluster] GFS crash

Chmouel Boudjnah cboudjnah at squiz.net
Mon Oct 3 23:27:58 UTC 2005


Hello,

I had a crash on a server using GFS-6.1 with kernel 2.6.9-11.ELsmp, i am
using GFS with an AOE SAN drive. 

I am not sure if the problem is with AOE SAN or with GFS would be great
to tell me so i can redirect the bug report to the CORAID people.

So i have first in the logs some weird stuff about sataide (i am not
sure if the SAN is using that) :

Sep 30 17:43:20 srv kernel: e send einval to 2
Sep 30 17:43:20 srv kernel: sataide send einval to 2
Sep 30 17:43:20 srv last message repeated 38 times
Sep 30 17:43:20 srv kernel: sataide unlock ff050383 no id
Sep 30 17:43:20 srv kernel: 231834 id 0 -1,3 1
Sep 30 17:43:20 srv kernel: 7814 qc 2,59f30e -1,5 id ffbe0378 sts 0 0
Sep 30 17:43:20 srv kernel: 19531 lk 5,59f30e id 0 -1,3 0
Sep 30 17:43:20 srv kernel: 4189 lk 2,2ed6bc id 0 -1,3 10001
Sep 30 17:43:20 srv kernel: 7814 qc 5,231834 -1,3 id 5dc0124 sts 0 0
Sep 30 17:43:20 srv kernel: 7814 qc 5,59f30e -1,3 id 27b00cf sts 0 0
Sep 30 17:43:20 srv kernel: 4189 lk 5,2ed6bc id 0 -1,3 1
Sep 30 17:43:20 srv kernel: 7814 qc 2,2ed6bc -1,3 id 1c0202 sts 0 0
Sep 30 17:43:20 srv kernel: 4189 lk 2,2903b3 id 0 -1,3 10001
Sep 30 17:43:20 srv kernel: 7814 qc 5,2ed6bc -1,3 id 227032a sts 0 0
Sep 30 17:43:20 srv kernel: 4189 lk 5,2903b3 id 0 -1,3 1
Sep 30 17:43:20 srv kernel: 7814 qc 2,2903b3 -1,3 id 23c036d sts 0 0
Sep 30 17:43:20 srv kernel: 4189 lk 2,2ba987 id 0 -1,3 10001
Sep 30 17:43:20 srv kernel: 4189 lk 5,2ba987 id 0 -1,3 1
Sep 30 17:43:20 srv kernel: 7814 qc 2,2ba987 -1,3 id 3ab033c sts 0 0
Sep 30 17:43:20 srv kernel: 7814 qc 5,2903b3 -1,3 id 1c80004 sts 0 0
Sep 30 17:43:20 srv kernel: 4189 lk 2,2ce731 id 0 -1,3 10001
Sep 30 17:43:20 srv kernel: 10052 lk 2,500e75 id 0 -1,5 0
Sep 30 17:43:20 srv kernel: 4189 lk 5,2ce731 id 0 -1,3 1
Sep 30 17:43:20 srv kernel: 7814 qc 5,2ba987 -1,3 id 1f003a sts 0 0
Sep 30 17:43:20 srv kernel: 7814 qc 2,2ce731 -1,3 id ff74033d sts 0 0
Sep 30 17:43:20 srv kernel: 19531 lk 5,500e74 id ffd101bd 3,5 805
Sep 30 17:43:20 srv kernel: 7814 qc 5,500e74 3,5 id ffd101bd sts 0 0
Sep 30 17:43:20 srv kernel: 7814 qc 2,500e75 -1,5 id 1660224 sts 0 0
Sep 30 17:43:20 srv kernel: 10052 lk 5,500e75 id 0 -1,3 0
Sep 30 17:43:20 srv kernel: 7814 qc 5,500e75 -1,3 id 3210323 sts 0 0
Sep 30 17:43:20 srv kernel: 29523 lk 2,217df id 0 -1,3 10000
Sep 30 17:43:20 srv kernel: 7814 qc 2,217df -1,3 id 5019b sts 0 0
Sep 30 17:43:20 srv kernel: 29523 lk 5,217df id 0 -1,3 0
Sep 30 17:43:21 srv kernel: 7814 qc 5,217df -1,3 id 2ae0267 sts 0 0
Sep 30 17:43:21 srv kernel: 7814 qc 5,2ce731 -1,3 id 7d0232 sts 0 0
Sep 30 17:43:21 srv kernel: 4189 lk 2,263a00 id 0 -1,3 10001
Sep 30 17:43:21 srv kernel: 7814 qc 2,263a00 -1,3 id 12700c3 sts 0 0
Sep 30 17:43:21 srv kernel: 4189 lk 5,263a00 id 0 -1,3 1
Sep 30 17:43:21 srv kernel: 4189 lk 2,2c446d id 0 -1,3 10001
Sep 30 17:43:21 srv kernel: 7814 qc 5,263a00 -1,3 id ffc00230 sts 0 0
Sep 30 17:43:21 srv kernel: 4189 lk 5,2c446d id 0 -1,3 1
Sep 30 17:43:21 srv kernel: 7814 qc 2,2c446d -1,3 id 34903b4 sts 0 0
Sep 30 17:43:21 srv kernel: 4189 lk 2,1e7a15 id 0 -1,3 10001
Sep 30 17:43:21 srv kernel: 7814 qc 5,2c446d -1,3 id fea901a1 sts 0 0
Sep 30 17:43:21 srv kernel: 4189 lk 5,1e7a15 id 0 -1,3 1


and the crash of GFS just after :

Sep 30 17:43:22 srv kernel: lock_dlm:  Assertion failed on line 353 of
file /usr/src/build/574067-i686/BUILD/smp/src/dlm/lock.c
Sep 30 17:43:22 srv kernel: lock_dlm:  assertion:  "!error"
Sep 30 17:43:22 srv kernel: lock_dlm:  time = 2509316164
Sep 30 17:43:22 srv kernel: sataide: error=-22 num=5,5bf2f1 lkf=801
flags=84
Sep 30 17:43:22 srv kernel:
Sep 30 17:43:22 srv kernel: ------------[ cut here ]------------
Sep 30 17:43:22 srv kernel: kernel BUG
at /usr/src/build/574067-i686/BUILD/smp/src/dlm/lock.c:353!
Sep 30 17:43:22 srv kernel: invalid operand: 0000 [#1]
Sep 30 17:43:22 srv kernel: SMP
Sep 30 17:43:22 srv kernel: Modules linked in: lock_dlm(U) aoe(U) gfs(U)
lock_harness(U) dlm(U) cman(U) md5 ipv6 joydev button battery
ac uhci_hcd ehci_hcd e1000 floppy sg dm_snapshot dm_zero dm_mirror ext3
jbd dm_mod mptscsih mptbase sd_mod scsi_mod
Sep 30 17:43:22 srv kernel: CPU:    0
Sep 30 17:43:22 srv kernel: EIP:    0060:[<f8b5360d>]    Not tainted VLI
Sep 30 17:43:22 srv kernel: EFLAGS: 00010246   (2.6.9-11.ELsmp)
Sep 30 17:43:22 srv kernel: EIP is at do_dlm_unlock+0xaa/0xbf [lock_dlm]
Sep 30 17:43:22 srv kernel: eax: 00000001   ebx: ffffffea   ecx:
f63f5f04   edx: f8b5809e
Sep 30 17:43:22 srv kernel: esi: cb3ac080   edi: cb3ac080   ebp:
f8b1d000   esp: f63f5f00
Sep 30 17:43:23 srv kernel: ds: 007b   es: 007b   ss: 0068
Sep 30 17:43:23 srv kernel: Process lock_dlm1 (pid: 7818,
threadinfo=f63f5000 task=f75bb0b0)
Sep 30 17:43:23 srv kernel: Stack: f8b5809e f8b1d000 00000003 f8b538c0
f8ab24f2 00000001 dcbdb3c0 dcbdb3a4
Sep 30 17:43:23 srv kernel:        f8aa8852 f8add0c0 d73b9e80 dcbdb3a4
f8add0c0 cb3ac080 f8aa7d4b dcbdb3a4
Sep 30 17:43:23 srv kernel:        00000001 00000001 f8aa7e02 dcbdb3c0
dcbdb3a4 f8aa99af cb3ac080 f7d50e00
Sep 30 17:43:23 srv kernel: Call Trace:
Sep 30 17:43:23 srv kernel:  [<f8b538c0>] lm_dlm_unlock+0x14/0x1c
[lock_dlm]
Sep 30 17:43:23 srv kernel:  [<f8ab24f2>] gfs_lm_unlock+0x2c/0x42 [gfs]
Sep 30 17:43:23 srv kernel:  [<f8aa8852>] gfs_glock_drop_th+0xf3/0x12d
[gfs]
Sep 30 17:43:23 srv kernel:  [<f8aa7d4b>] rq_demote+0x7f/0x98 [gfs]
Sep 30 17:43:23 srv kernel:  [<f8aa7e02>] run_queue+0x5a/0xc1 [gfs]
Sep 30 17:43:23 srv kernel:  [<f8aa99af>] blocking_cb+0x39/0x7a [gfs]
Sep 30 17:43:23 srv kernel:  [<f8b5727b>] process_blocking+0x90/0x93
[lock_dlm]
Sep 30 17:43:23 srv kernel:  [<f8b578c8>] dlm_async+0x28b/0x2ff
[lock_dlm]
Sep 30 17:43:23 srv kernel:  [<c011dc6f>] default_wake_function+0x0/0xc
Sep 30 17:43:23 srv kernel:  [<c011dc6f>] default_wake_function+0x0/0xc
Sep 30 17:43:23 srv kernel:  [<f8b5763d>] dlm_async+0x0/0x2ff [lock_dlm]
Sep 30 17:43:23 srv kernel:  [<c0132e31>] kthread+0x73/0x9b
Sep 30 17:43:23 srv kernel:  [<c0132dbe>] kthread+0x0/0x9b
Sep 30 17:43:23 srv kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
Sep 30 17:43:23 srv kernel: Code: 76 34 8b 06 ff 76 2c ff 76 08 ff 76 04
ff 76 0c 53 ff 70 18 68 a9 81 b5 f8 e8 d6 e3 5c c7 83 c4 34 68 9e 80 b5
f8 e8 c9 e3 5c c7 <0f> 0b 61 01 ef 7f b5 f8 68 a0 80 b5 f8 e8 84 db 5c
c7 5b 5e c3
Sep 30 17:43:23 srv kernel:  <0>Fatal exception: panic in 5 seconds


Cheers, Chmouel.


-- 
Chmouel Boudjnah - Squiz.net - http://www.squiz.net




More information about the Linux-cluster mailing list