[Linux-cluster] GFS journaling related crash

Sridharan Ramaswamy (srramasw) srramasw at cisco.com
Wed Feb 14 20:49:31 UTC 2007


GFS experts,
 
I'm looking for help to debug this crash. I'm running GFS with low
journal size. I understand this is bit unchartered territory for GFS,
but we have a need to be really stingy on journal overhead. 
 
GFS is consistently crashing with same stack trace, seems related to
journal log being committed to the disk. gfs_tool counters show
log-space-used close to 100%. iostat shows a huge write just before the
crash. Probably the journal blocks being flushed to disk?
 
I tried the workaround to reduce "max_atomic_write" value listed in
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=146672. But no
luck.
 
Could this be a GFS bug that gets exercised only on small journal size? 
 
Full stack trace, gfs counters and iostat data listed below. Appreciate
any help!
 
- Sridharan
Cisco Systems, Inc
 
---------------------
 
Logs:

Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=1: Looking at
journal...
Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=1: Done
Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=2: Trying to
acquire journal lock...
Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=2: Looking at
journal...
Feb 14 11:39:33 cfs1 kernel: GFS: fsid=cisco:gfs2.0: jid=2: Done
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: head_off = 61856,
head_wrap = 8
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: dump_off = 61856,
dump_wrap = 7
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0: assertion "FALSE"
failed
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0:   function =
check_seg_usage
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0:   file =
/download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/log.c, line = 590
Feb 14 11:41:50 cfs1 kernel: GFS: fsid=cisco:gfs2.0:   time = 1171482110
Feb 14 11:41:50 cfs1 kernel: ------------[ cut here ]------------
Feb 14 11:41:50 cfs1 kernel: kernel BUG at
/download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/util.c:211!
Feb 14 11:41:50 cfs1 kernel: invalid operand: 0000 [#1]
Feb 14 11:41:50 cfs1 kernel: SMP 
Feb 14 11:41:50 cfs1 kernel: Modules linked in: lock_dlm(U) dlm(U)
gfs(U) lock_harness(U) cman(U) nfsd exp
ortfs nfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4 i2c_dev
i2c_core sunrpc dm_mirror dm_mod bu
tton battery ac uhci_hcd ehci_hcd hw_random e1000 e100 mii floppy ext3
jbd
Feb 14 11:41:50 cfs1 kernel: CPU:    0
Feb 14 11:41:50 cfs1 kernel: EIP:    0060:[<e10864f6>]    Tainted: GF
VLI
Feb 14 11:41:50 cfs1 kernel: EFLAGS: 00010246   (2.6.9-42.7.ELsmp) 
Feb 14 11:41:50 cfs1 kernel: EIP is at gfs_assert_i+0x38/0x69 [gfs]
Feb 14 11:41:50 cfs1 kernel: eax: 000000f8   ebx: e0c9e000   ecx:
c7cb3b7c   edx: e108d0a2
Feb 14 11:41:50 cfs1 kernel: esi: e108702a   edi: e108a021   ebp:
0000024e   esp: c7cb3b78
Feb 14 11:41:50 cfs1 kernel: ds: 007b   es: 007b   ss: 0068
Feb 14 11:41:50 cfs1 kernel: Process bonnie++ (pid: 8091,
threadinfo=c7cb3000 task=df3c43b0)
Feb 14 11:41:50 cfs1 kernel: Stack: e108d0a2 e0cc2734 e108a021 e0cc2734
e108702a e0cc2734 e1089e34 0000024e 
Feb 14 11:41:50 cfs1 kernel:        e0cc2734 45d365fe 00000000 0000f190
c48b9080 e0c9e000 e106e880 e1089e34 
Feb 14 11:41:50 cfs1 kernel:        e0cc2734 45d365fe 00000000 0000f190
c48b9080 e0c9e000 e106e880 e1089e34 
Feb 14 11:41:50 cfs1 kernel:        0000024e 00000007 00000000 0000f1a0
00000000 0000f1c0 00000000 00000008 
Feb 14 11:41:50 cfs1 kernel:        0000024e 00000007 00000000 0000f1a0
00000000 0000f1c0 00000000 00000008 
Feb 14 11:41:50 cfs1 kernel: Call Trace:
Feb 14 11:41:50 cfs1 kernel:  [<e106e880>] check_seg_usage+0x197/0x19f
[gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106ea03>] sync_trans+0x143/0x1b1 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e1071641>] quota_trans_size+0x20/0x36
[gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106ebdd>] disk_commit+0xec/0x264 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106edf6>] log_refund+0x61/0x187 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106f235>] log_flush_internal+0xec/0x19e
[gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106e131>] gfs_log_reserve+0x19e/0x20e
[gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e1063f77>]
glock_wait_internal+0x1e3/0x1ef [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e10642e1>] gfs_glock_nq+0xe3/0x116 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e1085334>] gfs_trans_begin_i+0xfd/0x15a
[gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e106842e>]
inode_init_and_link+0x1fe/0x388 [gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e10648a3>] gfs_glock_nq_init+0x13/0x26
[gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e10648f4>] gfs_glock_nq_num+0x2e/0x71
[gfs]
Feb 14 11:41:50 cfs1 kernel:  [<e1068767>] gfs_createi+0x1af/0x1f1 [gfs]
Feb 14 11:41:51 cfs1 kernel:  [<e107a9d0>] gfs_create+0x68/0x16f [gfs]
Feb 14 11:41:51 cfs1 kernel:  [<c0167bd0>] vfs_create+0xbc/0x103
Feb 14 11:41:51 cfs1 kernel:  [<c0167fa8>] open_namei+0x177/0x579
Feb 14 11:41:51 cfs1 kernel:  [<c015a442>] filp_open+0x45/0x70
Feb 14 11:41:51 cfs1 kernel:  [<c02d2e66>] __cond_resched+0x14/0x39
Feb 14 11:41:51 cfs1 kernel:  [<c01c3326>]
direct_strncpy_from_user+0x3e/0x5d
Feb 14 11:41:51 cfs1 kernel:  [<c015a79e>] sys_open+0x31/0x7d
Feb 14 11:41:51 cfs1 kernel:  [<c015a7fc>] sys_creat+0x12/0x16
Feb 14 11:41:51 cfs1 kernel:  [<c02d48bb>] syscall_call+0x7/0xb
Feb 14 11:41:51 cfs1 kernel:  [<c02d007b>] packet_rcv+0x8e/0x307
Feb 14 11:41:51 cfs1 kernel: Code: c3 83 b8 30 02 00 00 00 74 2c ff 35
90 7d 43 c0 8d 80 34 47 02 00 50 55
 ff 74 24 20 50 51 50 52 50 68 a2 d0 08 e1 e8 ac c3 09 df <0f> 0b d3 00
25 d1 08 e1 83 c4 28 e8 9d f8 07 d
f ff 35 90 7d 43 
Feb 14 11:41:51 cfs1 kernel:  <0>Fatal exception: panic in 5 seconds

 
[root at cfs1 ~]$ gfs_tool counters /mnt/gfs2
 
                                  locks 11451
                             locks held 11450
                           freeze count 0
                          incore inodes 5713
                       metadata buffers 5838
                        unlinked inodes 1
                              quota IDs 2
                     incore log buffers 893
                         log space used 98.41%
              meta header cache entries 53
                     glock dependencies 0
                 glocks on reclaim list 0
                              log wraps 7
                   outstanding LM calls 0
                  outstanding BIO calls 0
                       fh2dentry misses 0
                       glocks reclaimed 9
                         glock nq calls 131554
                         glock dq calls 125830
                   glock prefetch calls 0
                          lm_lock calls 11484
                        lm_unlock calls 12
                           lm callbacks 11496
                     address operations 0
                      dentry operations 4
                      export operations 0
                        file operations 17177
                       inode operations 45696
                       super operations 45690
                          vm operations 0
                        block I/O reads 52
                       block I/O writes 17269

 
 
[root at cfs1 ~]$ iostat 2 1000 | grep hda12
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
hda12             0.42         1.52         1.84       2242       2720
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             2.51         0.00        20.10          0         40
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0
hda12          4467.68      0.00     35741.41      0      70768
hda12             0.00         0.00         0.00          0          0
hda12             0.00         0.00         0.00          0          0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070214/b12fb868/attachment.htm>


More information about the Linux-cluster mailing list