[Linux-cluster] GFS2 crash

Scooter Morris scooter at cgl.ucsf.edu
Wed Mar 17 18:41:14 UTC 2010


After removing kmod-gfs2 from all nodes, we ran just fine until last 
night, when we saw the same crash:

[2010-03-17 04:40:01]Wed Mar 17 05:40:01 PDT 2010
[2010-03-17 04:40:01]Unable to handle kernel NULL pointer dereference at 0000000000000078 RIP:
[2010-03-17 04:55:24] [<ffffffff88768383>] :gfs2:revoke_lo_add+0x1a/0x32
[2010-03-17 04:55:24]PGD 0
[2010-03-17 04:55:24]Oops: 0002 [1] SMP
[2010-03-17 04:55:24]last sysfs file: /devices/pci0000:00/0000:00:06.0/0000:0b:00.0/0000:0c:09.0/0000:0d:00.0/host0/rport-0:0-4/target0:0:4/0:0:4:1/state
[2010-03-17 04:55:24]CPU 7
[2010-03-17 04:55:24]Modules linked in: ip_conntrack_netbios_ns xt_state ip_conntrack nfnetlink iptable_filter ip_tables bridge autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2 dlm configfs lockd sunrpc xt_tcpudp ipt_REJECT arpt_mangle arptable_filter arp_tables x_tables ib_iser libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi ib_srp ib_sdp ib_ipoib ipoib_helper ipv6 xfrm_nalgo crypto_api rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa ib_mad ib_core dm_round_robin dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport st ide_cd sg cdrom hpilo pcspkr serio_raw bnx2 dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc ata_piix libata shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
[2010-03-17 04:55:25]Pid: 792, comm: kswapd0 Not tainted 2.6.18-164.11.1.el5 #1
[2010-03-17 04:55:25]RIP: 0010:[<ffffffff88768383>]  [<ffffffff88768383>] :gfs2:revoke_lo_add+0x1a/0x32
[2010-03-17 04:55:25]RSP: 0018:ffff81082e073ae8  EFLAGS: 00010286
[2010-03-17 04:55:25]RAX: 0000000000000000 RBX: ffff810031d9c2b0 RCX: ffff810041619e40
[2010-03-17 04:55:25]RDX: ffff81063fc3d1b0 RSI: ffff810819749708 RDI: ffff810819749000
[2010-03-17 04:55:25]RBP: ffff81063fc3d190 R08: ffff81082fead486 R09: ffff81082e073b20
[2010-03-17 04:55:25]R10: ffff8101065ae8a0 R11: ffffffff88768369 R12: ffff810819749000
[2010-03-17 04:55:25]R13: 0000000000000000 R14: ffff810031d9c2b0 R15: ffff810819749000
[2010-03-17 04:55:26]FS:  0000000000000000(0000) GS:ffff81082fead340(0000) knlGS:0000000000000000
[2010-03-17 04:55:26]CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[2010-03-17 04:55:26]CR2: 0000000000000078 CR3: 0000000000201000 CR4: 00000000000006e0
[2010-03-17 04:55:26]Process kswapd0 (pid: 792, threadinfo ffff81082e072000, task ffff81082f4ef7e0)
[2010-03-17 04:55:26]Stack:  ffffffff8876983c 000000002e073e10 ffff810031d9c2b0 ffff81010e355078
[2010-03-17 04:55:26] 0000000000000000 0000000000000000 ffffffff8876a9a2 000000000000000e
[2010-03-17 04:55:26] ffff81010e355078 00000000000000b0 ffff81082e073cf0 ffff810819749000
[2010-03-17 04:55:26]Call Trace:
[2010-03-17 04:55:26] [<ffffffff8876983c>] :gfs2:gfs2_remove_from_journal+0x11f/0x131
[2010-03-17 04:55:26] [<ffffffff8876a9a2>] :gfs2:gfs2_invalidatepage+0xea/0x151
[2010-03-17 04:55:26] [<ffffffff8876a5e5>] :gfs2:gfs2_writepage_common+0x95/0xb1
[2010-03-17 04:55:26] [<ffffffff8876ac0f>] :gfs2:gfs2_jdata_writepage+0x56/0x98
[2010-03-17 04:55:26] [<ffffffff800ca21c>] shrink_inactive_list+0x3fd/0x8d8
[2010-03-17 04:55:26] [<ffffffff8004819b>] __pagevec_release+0x19/0x22
[2010-03-17 04:55:26] [<ffffffff800c9cfe>] shrink_active_list+0x4b4/0x4c4
[2010-03-17 04:55:26] [<ffffffff80013007>] shrink_zone+0xf7/0x15d
[2010-03-17 04:55:26] [<ffffffff80057e41>] kswapd+0x323/0x46c
[2010-03-17 04:55:26] [<ffffffff800a00b7>] autoremove_wake_function+0x0/0x2e
[2010-03-17 04:55:27] [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
[2010-03-17 04:55:27] [<ffffffff80057b1e>] kswapd+0x0/0x46c
[2010-03-17 04:55:27] [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
[2010-03-17 04:55:27] [<ffffffff80032950>] kthread+0xfe/0x132
[2010-03-17 04:55:27] [<ffffffff8009cd34>] request_module+0x0/0x14d
[2010-03-17 04:55:27] [<ffffffff8005dfb1>] child_rip+0xa/0x11
[2010-03-17 04:55:27] [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
[2010-03-17 04:55:27] [<ffffffff80032852>] kthread+0x0/0x132
[2010-03-17 04:55:27] [<ffffffff8005dfa7>] child_rip+0x0/0x11
[2010-03-17 04:55:27]
[2010-03-17 04:55:27]
[2010-03-17 04:55:27]Code: ff 40 78 c7 40 50 01 00 00 00 ff 87 dc 06 00 00 48 89 d7 e9
[2010-03-17 04:55:27]RIP  [<ffffffff88768383>] :gfs2:revoke_lo_add+0x1a/0x32
[2010-03-17 04:55:27] RSP<ffff81082e073ae8>
[2010-03-17 04:55:27]CR2: 0000000000000078
[2010-03-17 04:55:27]<0>Kernel panic - not syncing: Fatal exception

So, it looks like it wasn't the old kmod-gfs2 :-(

-- scooter

On 03/04/2010 02:25 AM, Steven Whitehouse wrote:
> Hi,
>
> On Wed, 2010-03-03 at 21:23 -0800, Scooter Morris wrote:
>    
>> Hi all,
>>       Just had a crash on our 3 node RedHat Enterprise Linux 5.4 cluster
>> that looks a lot like
>> https://bugzilla.redhat.com/show_bug.cgi?id=520720.  We're running
>> kernel 2.6.18-164.11.1.el5.  Here is the traceback:
>>
>>      
> That seems a reasonable conclusion. I assume that you were running with
> one or more files with the journaled data flag set?
>
> [snip]
>    
>> Since we're already running the latest 5.4 kernel, it's not clear what
>> might be going on, here.  There is a note in the bug about making sure
>> the gfs2-kmod from 5.2 isn't still around.  What version of gfs2-kmod is
>> the old version, or should I just remove all instances of gfs2-kmod?
>>
>> -- scooter
>>
>>      
> You can remove all versions of the kmod since they are all old. This is
> the result of a packaging issue (which we are attempting to solve by
> providing an empty kmod in future versions which will override the old
> one) but in the mean time, upgrades from 5.2 or before require the old
> gfs2 kmod to be removed manually.
>
> I don't see any sign of the kmod in the stack trace you sent though, so
> I suspect its not an issue in this case. Certainly worth checking though
> to be certain.
>
> Steve.
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>    




More information about the Linux-cluster mailing list