[rhelv6-list] RHEL6.2 Kernel/EXT4 bug

Fri Jan 20 19:12:03 UTC 2012

So I broke the raid and used a single partition for iozone testing with 5000mb chunks. No errors reported. 
It's a combo of EXT4 and mdraid module that cause soft lockups. :( 

Benchmarking results to be released at a later time.

-----Original Message-----
From: rhelv6-list-bounces at redhat.com [mailto:rhelv6-list-bounces at redhat.com] On Behalf Of Brian Long
Sent: Thursday, January 12, 2012 3:12 PM
To: rhelv6-list at redhat.com
Subject: Re: [rhelv6-list] RHEL6.2 Kernel/EXT4 bug

I responded too quickly.  I thought it was finished and it is still running.  I thought RHEL used cfq by default.

/Brian/

On 1/12/12 2:54 PM, Musayev, Ilya wrote:
> I guess I can break the raid and try again on a single drive. I will let you know what happens.
> 
> Did you actually do 5000MB test with iozone? 
> 
> My 100MB and 1000MB are fine, only when I go into larger 5000MB range with iozone is when I start having issues. I could probably narrow it down and find the optimal break point, but I think it should not matter - as this should not happen altogether and does not occur with XFS.  At this point, I'm leaning more toward XFS as I get better or on par metrics of EXT4 without any issues.
> 
> I'm also curious as to why your IO scheduler was set to cfq, if I recall correctly - noop  should have been default.
> 
> -----Original Message-----
> From: rhelv6-list-bounces at redhat.com 
> [mailto:rhelv6-list-bounces at redhat.com] On Behalf Of Brian Long
> Sent: Thursday, January 12, 2012 1:41 PM
> To: rhelv6-list at redhat.com
> Subject: Re: [rhelv6-list] RHEL6.2 Kernel/EXT4 bug
> 
> On 1/12/12 12:14 PM, Musayev, Ilya wrote:
>> Curious if anyone has seen this in their RHEL6.2 setups, if you have
>> 6.1 or 6.2 please try this out and see what happens. List of commands 
>> to reproduce is below, latest iozone required.
>>
>>  
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=773377
> 
> I put the same kernel on my RH 6.2 workstation with a single drive and ran iozone with the same parameters.  I don't have the drive mirrored and I had to change the scheduler to noop since it was cfq by default.
> 
> The only partition I had with enough free space is encrypted, so kcryptd was taking 100% CPU while running iozone.  Have you narrowed it down to md-only?  What happens if you run the same test on just one of your drives?
> 
> I got a kernel oops early on, but no ext4 errors:
> Jan 12 12:51:26 brilong-lnx2 kernel: ------------[ cut here 
> ]------------ Jan 12 12:51:26 brilong-lnx2 kernel: WARNING: at 
> kernel/sched.c:5914
> thread_return+0x232/0x79d() (Not tainted) Jan 12 12:51:26 brilong-lnx2 
> kernel: Hardware name: IBM System x3200
> -[4362PAY]-
> Jan 12 12:51:26 brilong-lnx2 kernel: Modules linked in: autofs4 sunrpc 
> cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 ipt_REJECT
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter 
> ip_tables sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt 
> uinput sg microcode serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support 
> tg3 i3000_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod 
> cdrom pata_acpi ata_generic ata_piix radeon ttm drm_kms_helper drm 
> i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last
> unloaded: scsi_wait_scan]
> Jan 12 12:51:26 brilong-lnx2 kernel: Pid: 23, comm: kblockd/1 Not tainted 2.6.32-220.2.1.el6.x86_64 #1 Jan 12 12:51:26 brilong-lnx2 kernel: Call Trace:
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff81069997>] ?
> warn_slowpath_common+0x87/0xc0
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff810699ea>] ?
> warn_slowpath_null+0x1a/0x20
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff814eccc5>] ?
> thread_return+0x232/0x79d
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff812494d0>] ?
> blk_unplug_work+0x0/0x70
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff812494d0>] ?
> blk_unplug_work+0x0/0x70
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff8108b15c>] ?
> worker_thread+0x1fc/0x2a0
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff81090a10>] ?
> autoremove_wake_function+0x0/0x40
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff8108af60>] ?
> worker_thread+0x0/0x2a0
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff810906a6>] ?
> kthread+0x96/0xa0
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff8100c14a>] ?
> child_rip+0xa/0x20
> Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0 Jan 12 12:51:26 brilong-lnx2 kernel: [<ffffffff8100c140>] ?
> child_rip+0x0/0x20
> Jan 12 12:51:26 brilong-lnx2 kernel: ---[ end trace aeef27db2e12775f 
> ]---
> 
> /Brian/

-- 
       Brian Long                             |       |
       Corporate Security Programs Org    . | | | . | | | .
                                              '       '
                                              C I S C O

_______________________________________________
rhelv6-list mailing list
rhelv6-list at redhat.com
https://www.redhat.com/mailman/listinfo/rhelv6-list