[rhelv6-list] RHEL6.2 Kernel/EXT4 bug
Musayev, Ilya
imusayev at webmd.net
Thu Jan 12 17:14:14 UTC 2012
Curious if anyone has seen this in their RHEL6.2 setups, if you have 6.1 or 6.2 please try this out and see what happens. List of commands to reproduce is below, latest iozone required.
https://bugzilla.redhat.com/show_bug.cgi?id=773377
Description of problem:
While in process of bench-marking various file systems, i came across an issue
where EXT4 partition would lock up under specific load. In current setup it was
under iozone testing. The I/O scheduler is set to default "noop".
*This only occurs with EXT4 and not XFS.
A repetitive message i'd see in messages file is attached:
Jan 9 23:38:51 minigiant5 kernel: INFO: task jbd2/md0-8:16169 blocked for more
than 120 seconds.
Jan 9 23:38:51 minigiant5 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 9 23:38:51 minigiant5 kernel: jbd2/md0-8 D 0000000000000005 0 16169
2 0x00000000
Jan 9 23:38:51 minigiant5 kernel: ffff88010153bc20 0000000000000046
0000000000000000 ffff88010153bbe4
Jan 9 23:38:51 minigiant5 kernel: 0000000000000000 ffff88012dc24c00
ffff880028215f80 0000000000000400
Jan 9 23:38:51 minigiant5 kernel: ffff880108f230f8 ffff88010153bfd8
000000000000f4e8 ffff880108f230f8
Jan 9 23:38:51 minigiant5 kernel: Call Trace:
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff814ed2a3>] io_schedule+0x73/0xc0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811a9280>] sync_buffer+0x40/0x50
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff814edc5f>] __wait_on_bit+0x5f/0x90
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff814edd08>]
out_of_line_wait_on_bit+0x78/0x90
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81090a50>] ?
wake_bit_function+0x0/0x50
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811a9236>]
__wait_on_buffer+0x26/0x30
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa022a0e6>]
jbd2_journal_commit_transaction+0xa76/0x14b0 [jbd2]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff810096f0>] ?
__switch_to+0xd0/0x320
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8107c99b>] ?
try_to_del_timer_sync+0x7b/0xe0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa022f928>] kjournald2+0xb8/0x220
[jbd2]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81090a10>] ?
autoremove_wake_function+0x0/0x40
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa022f870>] ? kjournald2+0x0/0x220
[jbd2]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff810906a6>] kthread+0x96/0xa0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Jan 9 23:38:51 minigiant5 kernel: INFO: task iozone:17280 blocked for more
than 120 seconds.
Jan 9 23:38:51 minigiant5 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 9 23:38:51 minigiant5 kernel: iozone D 0000000000000000 0 17280
16178 0x00000000
Jan 9 23:38:51 minigiant5 kernel: ffff880127e779a8 0000000000000086
ffff880127e77998 00000000ffffffff
Jan 9 23:38:51 minigiant5 kernel: ffffffff81ea1100 0000000000000286
ffff880127e77958 ffffffff8107c99b
Jan 9 23:38:51 minigiant5 kernel: ffff8800b9071b38 ffff880127e77fd8
000000000000f4e8 ffff8800b9071b38
Jan 9 23:38:51 minigiant5 kernel: Call Trace:
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8107c99b>] ?
try_to_del_timer_sync+0x7b/0xe0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81090cfe>] ?
prepare_to_wait+0x4e/0x80
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa022909d>]
do_get_write_access+0x29d/0x520 [jbd2]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81090a50>] ?
wake_bit_function+0x0/0x50
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa0229471>]
jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa027db78>]
__ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa0259253>]
ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa02592cc>]
ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa02595c0>]
ext4_dirty_inode+0x40/0x60 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8119fc6b>]
__mark_inode_dirty+0x3b/0x160
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811901e2>]
file_update_time+0xf2/0x170
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81112b50>]
__generic_file_aio_write+0x220/0x480
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8105e770>] ?
default_wake_function+0x0/0x20
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81112e1f>]
generic_file_aio_write+0x6f/0xe0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa0252de1>]
ext4_file_write+0x61/0x1e0 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811b4ccd>] ?
fsnotify_add_notify_event+0x12d/0x280
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811760fa>]
do_sync_write+0xfa/0x140
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811b4613>] ? fsnotify+0x113/0x160
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81090a10>] ?
autoremove_wake_function+0x0/0x40
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81218e2b>] ?
selinux_file_permission+0xfb/0x150
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8120c1d6>] ?
security_file_permission+0x16/0x20
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811763f8>] vfs_write+0xb8/0x1a0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81176e01>] sys_write+0x51/0x90
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81176753>] ? sys_lseek+0x53/0x80
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8100b0f2>]
system_call_fastpath+0x16/0x1b
Jan 9 23:38:51 minigiant5 kernel: INFO: task iozone:17282 blocked for more
than 120 seconds.
Jan 9 23:38:51 minigiant5 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 9 23:38:51 minigiant5 kernel: iozone D 0000000000000000 0 17282
16178 0x00000000
Jan 9 23:38:51 minigiant5 kernel: ffff880108fe59a8 0000000000000082
0000000000000000 00000000ffffffff
Jan 9 23:38:51 minigiant5 kernel: ffffffff81ea1100 0000000000000286
ffff880108fe5958 ffffffff8107c99b
Jan 9 23:38:51 minigiant5 kernel: ffff880128e45078 ffff880108fe5fd8
000000000000f4e8 ffff880128e45078
Jan 9 23:38:51 minigiant5 kernel: Call Trace:
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8107c99b>] ?
try_to_del_timer_sync+0x7b/0xe0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa022909d>]
do_get_write_access+0x29d/0x520 [jbd2]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81090a50>] ?
wake_bit_function+0x0/0x50
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa0229471>]
jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa027db78>]
__ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa0259253>]
ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa02592cc>]
ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa02595c0>]
ext4_dirty_inode+0x40/0x60 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8119fc6b>]
__mark_inode_dirty+0x3b/0x160
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811901e2>]
file_update_time+0xf2/0x170
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81112b50>]
__generic_file_aio_write+0x220/0x480
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8105e770>] ?
default_wake_function+0x0/0x20
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81112e1f>]
generic_file_aio_write+0x6f/0xe0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffffa0252de1>]
ext4_file_write+0x61/0x1e0 [ext4]
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811b4ccd>] ?
fsnotify_add_notify_event+0x12d/0x280
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811760fa>]
do_sync_write+0xfa/0x140
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811b4613>] ? fsnotify+0x113/0x160
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81090a10>] ?
autoremove_wake_function+0x0/0x40
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81218e2b>] ?
selinux_file_permission+0xfb/0x150
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8120c1d6>] ?
security_file_permission+0x16/0x20
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff811763f8>] vfs_write+0xb8/0x1a0
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff81176e01>] sys_write+0x51/0x90
Jan 9 23:38:51 minigiant5 kernel: [<ffffffff8100b0f2>]
system_call_fastpath+0x16/0x1b
Jan 10 00:06:51 minigiant5 kernel: INFO: task jbd2/md0-8:16169 blocked for more
than 120 seconds.
Jan 10 00:06:51 minigiant5 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 00:06:51 minigiant5 kernel: jbd2/md0-8 D 0000000000000005 0 16169
2 0x00000000
Jan 10 00:06:51 minigiant5 kernel: ffff88010153bc20 0000000000000046
ffff88010153bbe8 ffff88010153bbe4
Jan 10 00:06:51 minigiant5 kernel: 0000000000000001 ffff88012dc24c00
ffff8800282f5f80 0000000000000400
Jan 10 00:06:51 minigiant5 kernel: ffff880108f230f8 ffff88010153bfd8
000000000000f4e8 ffff880108f230f8
Jan 10 00:06:51 minigiant5 kernel: Call Trace:
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff814ed2a3>] io_schedule+0x73/0xc0
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff811a9280>] sync_buffer+0x40/0x50
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff814edc5f>] __wait_on_bit+0x5f/0x90
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff814edd08>]
out_of_line_wait_on_bit+0x78/0x90
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff81090a50>] ?
wake_bit_function+0x0/0x50
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff811a9236>]
__wait_on_buffer+0x26/0x30
Jan 10 00:06:51 minigiant5 kernel: [<ffffffffa022a7a1>]
jbd2_journal_commit_transaction+0x1131/0x14b0 [jbd2]
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff810096f0>] ?
__switch_to+0xd0/0x320
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff8107c99b>] ?
try_to_del_timer_sync+0x7b/0xe0
Jan 10 00:06:51 minigiant5 kernel: [<ffffffffa022f928>] kjournald2+0xb8/0x220
[jbd2]
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff81090a10>] ?
autoremove_wake_function+0x0/0x40
Jan 10 00:06:51 minigiant5 kernel: [<ffffffffa022f870>] ? kjournald2+0x0/0x220
[jbd2]
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff810906a6>] kthread+0x96/0xa0
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0
Jan 10 00:06:51 minigiant5 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Jan 10 00:16:51 minigiant5 kernel: INFO: task jbd2/md0-8:16169 blocked for more
than 120 seconds.
Jan 10 00:16:51 minigiant5 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 10 00:16:51 minigiant5 kernel: jbd2/md0-8 D 0000000000000005 0 16169
2 0x00000000
Jan 10 00:16:51 minigiant5 kernel: ffff88010153bc20 0000000000000046
ffff88010153bbe8 ffff88010153bbe4
Jan 10 00:16:51 minigiant5 kernel: 0000000000000001 ffff88012dc24c00
ffff880028235f80 0000000000000400
Jan 10 00:16:51 minigiant5 kernel: ffff880108f230f8 ffff88010153bfd8
000000000000f4e8 ffff880108f230f8
Jan 10 00:16:51 minigiant5 kernel: Call Trace:
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff814ed2a3>] io_schedule+0x73/0xc0
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff811a9280>] sync_buffer+0x40/0x50
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff814edc5f>] __wait_on_bit+0x5f/0x90
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff811a9240>] ? sync_buffer+0x0/0x50
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff814edd08>]
out_of_line_wait_on_bit+0x78/0x90
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff81090a50>] ?
wake_bit_function+0x0/0x50
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff811a9236>]
__wait_on_buffer+0x26/0x30
Jan 10 00:16:51 minigiant5 kernel: [<ffffffffa022a7a1>]
jbd2_journal_commit_transaction+0x1131/0x14b0 [jbd2]
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff810096f0>] ?
__switch_to+0xd0/0x320
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff8107c99b>] ?
try_to_del_timer_sync+0x7b/0xe0
Jan 10 00:16:51 minigiant5 kernel: [<ffffffffa022f928>] kjournald2+0xb8/0x220
[jbd2]
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff81090a10>] ?
autoremove_wake_function+0x0/0x40
Jan 10 00:16:51 minigiant5 kernel: [<ffffffffa022f870>] ? kjournald2+0x0/0x220
[jbd2]
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff810906a6>] kthread+0x96/0xa0
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff81090610>] ? kthread+0x0/0xa0
Jan 10 00:16:51 minigiant5 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Version-Release number of selected component (if applicable):
[root at minigiant5 ~]# uname -a
Linux minigiant5 2.6.32-220.2.1.el6.x86_64 #1 SMP Tue Dec 13 16:21:34 EST 2011
x86_64 x86_64 x86_64 GNU/Linux
[root at minigiant5 ~]# dmsetup version
Library version: 1.02.66-RHEL6 (2011-10-12)
Driver version: 4.22.6
2 x 1 TB SATA drives with software RAID 1.
scsi-SATA_SAMSUNG_HD103SJS246J9KB510183
scsi-SATA_SAMSUNG_HD103SJS246J9KB510187
System Information
Manufacturer: Dell Inc.
Product Name: OptiPlex 990
RAM: 4GB
CPU: Quad Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz with HT
How reproducible:
Always
Steps to Reproduce:
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mkfs.ext4 /dev/md0
mount /dev/md0 /storage/ -o noatime
iozone -R -l 5 -u 5 -r 4k -s 5000m -F /storage/f1 /storage/f2 /storage/f3
/storage/f4 /storage/f5
Actual results:
Process takes much longer to complete due to EXT4 FS lockups
Expected results:
The partition along with the process should not lock due to FS issues
Additional info:
It takes a to run while but eventually locks up the partition.
Ilya Musayev
- Systems Architect
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhelv6-list/attachments/20120112/57252791/attachment.htm>
More information about the rhelv6-list
mailing list