[Linux-cluster] GFS crash

Bas van der Vlies basv at sara.nl
Mon Jul 3 07:44:04 UTC 2006


We are using kernel 2.6.16 and cvs STABLE code 1.0.2. We have a 5 node 
GFS cluster that exports the GFS filesystems as NFS to our cluster. This 
is the error log is crashed in: gfs_glockd
------------------------------------------
isa_vg5_lv2 send einval to 3
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 send einval to 4
lisa_vg5_lv1 unlock febd02eb no id
7367 pr_start cb jid 2 id 3
7367 pr_start 121 done 0
7428 recovery_done jid 2 msg 308 191b
7428 recovery_done nodeid 3 flg 1b
7428 recovery_done start_done 121
7348 pr_start last_stop 95 last_start 121 last_finish 95
7348 pr_start count 4 type 1 event 121 flags a1b
7348 pr_start cb jid 2 id 3
7348 pr_start 121 done 0
7330 pr_start last_stop 87 last_start 121 last_finish 87
7330 pr_start count 4 type 1 event 121 flags a1b
7330 pr_start cb jid 2 id 3
7330 pr_start 121 done 0
7409 recovery_done jid 2 msg 308 191b
7409 recovery_done nodeid 3 flg 1b
7409 recovery_done start_done 121
7390 recovery_done jid 2 msg 308 91b
7390 recovery_done nodeid 3 flg 1b
7390 recovery_done start_done 121
7310 pr_start last_stop 75 last_start 121 last_finish 75
7310 pr_start count 4 type 1 event 121 flags a1b
7310 pr_start cb jid 2 id 3
7310 pr_start 121 done 0
7371 recovery_done jid 2 msg 308 91b
7371 recovery_done nodeid 3 flg 1b
7371 recovery_done start_done 121
7290 pr_start last_stop 56 last_start 121 last_finish 56
7290 pr_start count 4 type 1 event 121 flags a1b
7290 pr_start cb jid 2 id 3
7290 pr_start 121 done 0
7352 recovery_done jid 2 msg 308 91b
7352 recovery_done nodeid 3 flg 1b
7352 recovery_done start_done 121
7271 pr_start last_stop 40 last_start 121 last_finish 40
7271 pr_start count 4 type 1 event 121 flags a1b
7271 pr_start cb jid 2 id 3
7271 pr_start 121 done 0
7333 recovery_done jid 2 msg 308 91b
7333 recovery_done nodeid 3 flg 1b
7333 recovery_done start_done 121
7252 pr_start last_stop 24 last_start 121 last_finish 24
7252 pr_start count 4 type 1 event 121 flags 1a1b
7252 pr_start cb jid 2 id 3
7252 pr_start 121 done 0
7314 recovery_done jid 2 msg 308 91b
7314 recovery_done nodeid 3 flg 1b
7314 recovery_done start_done 121
7294 recovery_done jid 2 msg 308 91b
7294 recovery_done nodeid 3 flg 1b
7294 recovery_done start_done 121
7275 recovery_done jid 2 msg 308 91b
7275 recovery_done nodeid 3 flg 1b
7275 recovery_done start_done 121
7256 recovery_done jid 2 msg 308 191b
7256 recovery_done nodeid 3 flg 1b
7256 recovery_done start_done 121
7310 pr_finish flags 81b
7368 pr_finish flags 81b
7348 pr_finish flags 81b
7444 pr_finish flags 181b
7329 pr_finish flags 81b
7425 pr_finish flags 181b
7405 pr_finish flags 181b
7290 pr_finish flags 81b
7252 pr_finish flags 181b
7386 pr_finish flags 81b
7272 pr_finish flags 81b
7251 pr_start last_stop 121 last_start 125 last_finish 121
7251 pr_start count 5 type 2 event 125 flags 1a1b
7251 pr_start 125 done 1
7252 pr_finish flags 181b
7271 pr_start last_stop 121 last_start 127 last_finish 121
7271 pr_start count 5 type 2 event 127 flags a1b
7271 pr_start 127 done 1
7271 pr_finish flags 81b
7291 pr_start last_stop 121 last_start 129 last_finish 121
7291 pr_start count 5 type 2 event 129 flags a1b
7291 pr_start 129 done 1
7291 pr_finish flags 81b
7311 pr_start last_stop 121 last_start 131 last_finish 121
7311 pr_start count 5 type 2 event 131 flags a1b
7311 pr_start 131 done 1
7311 pr_finish flags 81b
7330 pr_start last_stop 121 last_start 133 last_finish 121
7330 pr_start count 5 type 2 event 133 flags a1b
7330 pr_start 133 done 1
7330 pr_finish flags 81b
7349 pr_start last_stop 121 last_start 135 last_finish 121
7349 pr_start count 5 type 2 event 135 flags a1b
7349 pr_start 135 done 1
7349 pr_finish flags 81b
7367 pr_start last_stop 121 last_start 137 last_finish 121
7367 pr_start count 5 type 2 event 137 flags a1b
7367 pr_start 137 done 1
7367 pr_finish flags 81b
7386 pr_start last_stop 121 last_start 139 last_finish 121
7386 pr_start count 5 type 2 event 139 flags a1b
7386 pr_start 139 done 1
7386 pr_finish flags 81b
7406 pr_start last_stop 121 last_start 141 last_finish 121
7406 pr_start count 5 type 2 event 141 flags 1a1b
7406 pr_start 141 done 1
7406 pr_finish flags 181b
7425 pr_start last_stop 121 last_start 143 last_finish 121
7425 pr_start count 5 type 2 event 143 flags 1a1b
7425 pr_start 143 done 1
7425 pr_finish flags 181b
7443 pr_start last_stop 121 last_start 145 last_finish 121
7443 pr_start count 5 type 2 event 145 flags 1a1b
7443 pr_start 145 done 1
7443 pr_finish flags 181b

lock_dlm:  Assertion failed on line 357 of file 
/usr/src/gfs/stable_1.0.2/stable/cluster/gfs-kernel/src/dlm/lock.c
lock_dlm:  assertion:  "!error"
lock_dlm:  time = 1486517232
lisa_vg5_lv1: error=-22 num=3,990448c lkf=9 flags=84

------------[ cut here ]------------
kernel BUG at 
/usr/src/gfs/stable_1.0.2/stable/cluster/gfs-kernel/src/dlm/lock.c:357!
invalid opcode: 0000 [#1]
SMP
Modules linked in: lock_dlm dlm cman dm_round_robin dm_multipath sg 
ide_floppy ide_cd cdrom qla2xxx siimage piix e1000 gfs lock_harness dm_mod
CPU:    0
EIP:    0060:[<f8aa5586>]    Tainted: GF     VLI
EFLAGS: 00010246   (2.6.16-rc5-sara3 #1)
EIP is at do_dlm_unlock+0x91/0xaa [lock_dlm]
eax: 00000004   ebx: dbdff440   ecx: 00014e5f   edx: 00000246
esi: ffffffea   edi: f8c0b000   ebp: f22bdee0   esp: f22bded4
ds: 007b   es: 007b   ss: 0068
Process gfs_glockd (pid: 7427, threadinfo=f22bc000 task=f209d030)
Stack: <0>f8aa9d89 f8c0b000 dbdf7120 f22bdeec f8aa5824 dbdff440 f22bdf00 
f899a7bc
        dbdff440 00000003 dbdf7144 f22bdf24 f8990ca4 f8c0b000 dbdff440 
00000003
        f89c4f00 dbde1200 dbdf7120 dbdf7120 f22bdf40 f899393a dbdf7120 
dbde1200
Call Trace:
  [<c0103599>] show_stack_log_lvl+0xad/0xb5
  [<c01036db>] show_registers+0x10d/0x176
  [<c01038ad>] die+0xf2/0x16d
  [<c0103996>] do_trap+0x6e/0x8a
  [<c0103bed>] do_invalid_op+0x90/0x97
  [<c010322f>] error_code+0x4f/0x54
  [<f8aa5824>] lm_dlm_unlock+0x1d/0x24 [lock_dlm]
  [<f899a7bc>] gfs_lm_unlock+0x2c/0x46 [gfs]
  [<f8990ca4>] gfs_glock_drop_th+0xf0/0x12d [gfs]
  [<f899393a>] rgrp_go_drop_th+0x1d/0x24 [gfs]
  [<f89901f9>] rq_demote+0x79/0x95 [gfs]
  [<f89902b4>] run_queue+0x56/0xbb [gfs]
  [<f89903d6>] unlock_on_glock+0x1f/0x29 [gfs]
  [<f899232a>] gfs_reclaim_glock+0xbf/0x138 [gfs]
  [<f8986682>] gfs_glockd+0x3b/0xe3 [gfs]
  [<c0100ed9>] kernel_thread_helper+0x5/0xb
Code: 73 34 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 8b 03 ff 70 18 68 a0 
a6 aa f8 e8 80 19 67 c7 83 c4 34 68 89 9d aa f8 e8 73 19 67 c7 <0f> 0b 
65 01 c0 a4 aa f8 68 a0 a5 aa f8 e8 27 12 67 c7 8d 65 f8
  <3>fh_update: test2/CHGCAR already up-to-date!
fh_update: test2/CHGCAR already up-to-date!
fh_update: test2/WAVECAR already up-to-date!
fh_update: test2/WAVECAR already up-to-date!

-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************




More information about the Linux-cluster mailing list