<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3020" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=198104519-14022007>GFS experts,</SPAN></DIV>
<DIV><SPAN class=198104519-14022007></SPAN> </DIV>
<DIV><SPAN class=198104519-14022007>I'm looking for help to debug
this crash. I'm running GFS with low journal size. I understand this is bit
unchartered territory for GFS, but we have a need to be really stingy on journal
overhead. </SPAN></DIV>
<DIV><SPAN class=198104519-14022007></SPAN> </DIV>
<DIV><SPAN class=198104519-14022007>GFS is consistently crashing with same stack
trace, seems related to journal log being committed to the disk. gfs_tool
counters show log-space-used close to 100%. iostat shows a huge write just
before the crash. Probably the journal blocks being flushed to
disk?</SPAN></DIV>
<DIV><SPAN class=198104519-14022007></SPAN> </DIV>
<DIV><SPAN class=198104519-14022007>I tried the workaround to reduce
"max_atomic_write" value listed in </SPAN><SPAN
class=198104519-14022007><SPAN class=187442120-14022007><FONT color=#0000ff><A
href="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=146672">https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=146672</A></FONT></SPAN><SPAN
class=187442120-14022007><SPAN class=198104519-14022007>. But no
luck.</SPAN></SPAN></SPAN></DIV>
<DIV><SPAN class=198104519-14022007><SPAN class=187442120-14022007><SPAN
class=198104519-14022007></SPAN></SPAN></SPAN> </DIV>
<DIV><SPAN class=198104519-14022007>Could this be a GFS bug that gets exercised
only on small journal size? </SPAN></DIV>
<DIV><SPAN class=198104519-14022007></SPAN> </DIV>
<DIV><SPAN class=198104519-14022007>Full stack trace, gfs counters and iostat
data listed below. Appreciate any help!</SPAN></DIV>
<DIV><SPAN class=198104519-14022007></SPAN> </DIV>
<DIV><SPAN class=198104519-14022007>- Sridharan</SPAN></DIV>
<DIV><SPAN class=198104519-14022007>Cisco Systems, Inc</SPAN></DIV>
<DIV><SPAN class=198104519-14022007></SPAN> </DIV>
<DIV><SPAN class=198104519-14022007>---------------------</SPAN></DIV>
<DIV><SPAN class=198104519-14022007></SPAN> </DIV>
<DIV><SPAN class=198104519-14022007>Logs:</SPAN></DIV>
<DIV><BR>Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=1: Looking at
journal...<BR>Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=1:
Done<BR>Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=2: Trying to
acquire journal lock...<BR>Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0:
jid=2: Looking at journal...<BR>Feb 14 11:39:33 cfs1 kernel: GFS:
fsid=cisco:gfs2.0: jid=2: Done</DIV>
<DIV>Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: head_off = 61856,
head_wrap = 8<BR>Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: dump_off =
61856, dump_wrap = 7<BR>Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0:
assertion "FALSE" failed<BR>Feb 14 11:41:50 cfs1 kernel: GFS:
fsid=cisco:gfs2.0: function = check_seg_usage<BR>Feb 14 11:41:50
cfs1 kernel: GFS: fsid=cisco:gfs2.0: file =
/download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/log.c, line = 590<BR>Feb 14
11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: time =
1171482110<BR>Feb 14 11:41:50 cfs1 kernel: ------------[ cut here
]------------<BR>Feb 14 11:41:50 cfs1 kernel: kernel BUG at
/download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/util.c:211!<BR>Feb 14
11:41:50 cfs1 kernel: invalid operand: 0000 [#1]<BR>Feb 14 11:41:50 cfs1 kernel:
SMP <BR>Feb 14 11:41:50 cfs1 kernel: Modules linked in: lock_dlm(U) dlm(U)
gfs(U) lock_harness(U) cman(U) nfsd exp<BR>ortfs nfs lockd nfs_acl md5 ipv6
parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc dm_mirror dm_mod
bu<BR>tton battery ac uhci_hcd ehci_hcd hw_random e1000 e100 mii floppy ext3
jbd<BR>Feb 14 11:41:50 cfs1 kernel: CPU: 0<BR>Feb 14 11:41:50
cfs1 kernel: EIP: 0060:[<e10864f6>]
Tainted: GF VLI<BR>Feb 14 11:41:50 cfs1 kernel: EFLAGS:
00010246 (2.6.9-42.7.ELsmp) <BR>Feb 14 11:41:50 cfs1 kernel: EIP is
at gfs_assert_i+0x38/0x69 [gfs]<BR>Feb 14 11:41:50 cfs1 kernel: eax:
000000f8 ebx: e0c9e000 ecx: c7cb3b7c edx:
e108d0a2<BR>Feb 14 11:41:50 cfs1 kernel: esi: e108702a edi:
e108a021 ebp: 0000024e esp: c7cb3b78<BR>Feb 14 11:41:50
cfs1 kernel: ds: 007b es: 007b ss: 0068<BR>Feb 14
11:41:50 cfs1 kernel: Process bonnie++ (pid: 8091, threadinfo=c7cb3000
task=df3c43b0)<BR>Feb 14 11:41:50 cfs1 kernel: Stack: e108d0a2 e0cc2734 e108a021
e0cc2734 e108702a e0cc2734 e1089e34 0000024e <BR>Feb 14 11:41:50 cfs1
kernel: e0cc2734 45d365fe 00000000
0000f190 c48b9080 e0c9e000 e106e880 e1089e34 <BR>Feb 14 11:41:50 cfs1
kernel: e0cc2734 45d365fe 00000000
0000f190 c48b9080 e0c9e000 e106e880 e1089e34 <BR>Feb 14 11:41:50 cfs1
kernel: 0000024e 00000007 00000000
0000f1a0 00000000 0000f1c0 00000000 00000008 <BR>Feb 14 11:41:50 cfs1
kernel: 0000024e 00000007 00000000
0000f1a0 00000000 0000f1c0 00000000 00000008 <BR>Feb 14 11:41:50 cfs1 kernel:
Call Trace:<BR>Feb 14 11:41:50 cfs1 kernel: [<e106e880>]
check_seg_usage+0x197/0x19f [gfs]<BR>Feb 14 11:41:50 cfs1 kernel:
[<e106ea03>] sync_trans+0x143/0x1b1 [gfs]<BR>Feb 14 11:41:50 cfs1
kernel: [<e1071641>] quota_trans_size+0x20/0x36 [gfs]<BR>Feb 14
11:41:50 cfs1 kernel: [<e106ebdd>] disk_commit+0xec/0x264
[gfs]<BR>Feb 14 11:41:50 cfs1 kernel: [<e106edf6>]
log_refund+0x61/0x187 [gfs]<BR>Feb 14 11:41:50 cfs1 kernel:
[<e106f235>] log_flush_internal+0xec/0x19e [gfs]<BR>Feb 14 11:41:50 cfs1
kernel: [<e106e131>] gfs_log_reserve+0x19e/0x20e [gfs]<BR>Feb 14
11:41:50 cfs1 kernel: [<e1063f77>] glock_wait_internal+0x1e3/0x1ef
[gfs]<BR>Feb 14 11:41:50 cfs1 kernel: [<e10642e1>]
gfs_glock_nq+0xe3/0x116 [gfs]<BR>Feb 14 11:41:50 cfs1 kernel:
[<e1085334>] gfs_trans_begin_i+0xfd/0x15a [gfs]<BR>Feb 14 11:41:50 cfs1
kernel: [<e106842e>] inode_init_and_link+0x1fe/0x388 [gfs]<BR>Feb 14
11:41:50 cfs1 kernel: [<e10648a3>] gfs_glock_nq_init+0x13/0x26
[gfs]<BR>Feb 14 11:41:50 cfs1 kernel: [<e10648f4>]
gfs_glock_nq_num+0x2e/0x71 [gfs]<BR>Feb 14 11:41:50 cfs1 kernel:
[<e1068767>] gfs_createi+0x1af/0x1f1 [gfs]<BR>Feb 14 11:41:51 cfs1
kernel: [<e107a9d0>] gfs_create+0x68/0x16f [gfs]<BR>Feb 14 11:41:51
cfs1 kernel: [<c0167bd0>] vfs_create+0xbc/0x103<BR>Feb 14 11:41:51
cfs1 kernel: [<c0167fa8>] open_namei+0x177/0x579<BR>Feb 14 11:41:51
cfs1 kernel: [<c015a442>] filp_open+0x45/0x70<BR>Feb 14 11:41:51
cfs1 kernel: [<c02d2e66>] __cond_resched+0x14/0x39<BR>Feb 14
11:41:51 cfs1 kernel: [<c01c3326>]
direct_strncpy_from_user+0x3e/0x5d<BR>Feb 14 11:41:51 cfs1 kernel:
[<c015a79e>] sys_open+0x31/0x7d<BR>Feb 14 11:41:51 cfs1 kernel:
[<c015a7fc>] sys_creat+0x12/0x16<BR>Feb 14 11:41:51 cfs1 kernel:
[<c02d48bb>] syscall_call+0x7/0xb<BR>Feb 14 11:41:51 cfs1 kernel:
[<c02d007b>] packet_rcv+0x8e/0x307<BR>Feb 14 11:41:51 cfs1 kernel: Code:
c3 83 b8 30 02 00 00 00 74 2c ff 35 90 7d 43 c0 8d 80 34 47 02 00 50
55<BR> ff 74 24 20 50 51 50 52 50 68 a2 d0 08 e1 e8 ac c3 09 df <0f>
0b d3 00 25 d1 08 e1 83 c4 28 e8 9d f8 07 d<BR>f ff 35 90 7d 43 <BR>Feb 14
11:41:51 cfs1 kernel: <0>Fatal exception: panic in 5
seconds<BR></DIV>
<DIV> </DIV>
<DIV>[root@cfs1 ~]$ gfs_tool counters /mnt/gfs2</DIV>
<DIV> </DIV>
<DIV>
locks
11451<BR>
locks held
11450<BR>
freeze count
0<BR>
incore inodes
5713<BR>
metadata buffers
5838<BR>
unlinked inodes
1<BR>
quota IDs
2<BR>
incore log buffers
893<BR>
log space used
98.41%<BR>
meta header cache entries
53<BR>
glock dependencies
0<BR>
glocks on reclaim list
0<BR>
log wraps
7<BR>
outstanding LM calls
0<BR>
outstanding BIO calls
0<BR>
fh2dentry misses
0<BR>
glocks reclaimed
9<BR>
glock nq calls
131554<BR>
glock dq calls
125830<BR>
glock prefetch calls
0<BR>
lm_lock calls
11484<BR>
lm_unlock calls
12<BR>
lm callbacks
11496<BR>
address operations
0<BR>
dentry operations
4<BR>
export operations
0<BR>
file operations
17177<BR>
inode operations
45696<BR>
super operations
45690<BR>
vm operations
0<BR>
block I/O reads
52<BR>
block I/O writes 17269<BR></DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV>[root@cfs1 ~]$ iostat 2 1000 | grep hda12</DIV>
<DIV>Device:
tps Blk_read/s Blk_wrtn/s
Blk_read
Blk_wrtn<BR>hda12
0.42
1.52
1.84
2242
2720<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
2.51
0.00
20.10
0
40<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
4467.68 0.00
35741.41 0
70768<BR>hda12
0.00
0.00
0.00
0
0<BR>hda12
0.00
0.00
0.00
0
0<BR></DIV></BODY></HTML>