[Linux-cluster] gfs2 blocking tasks

Digimer lists at alteeve.ca
Sun Aug 19 15:45:07 UTC 2012


On 08/19/2012 05:52 AM, Bart Verwilst wrote:
> Hi,
> 
> I have a 3-node cluster in testing which seem to work quite well ( cman,
> rgmanager, gfs2, etc ).
> On (only) one of my nodes, yesterday i noticed the message below in dmesg.
> 
> I saw this 30 minutes after the facts. I could browse both my gfs2
> mounts, there was no fencing or anything on any node.
> 
> Any idea what might have caused this, and then go away?
> 
> Aug 19 00:10:01 vm02-test kernel: [282120.240067] INFO: task
> kworker/1:0:3117 blocked for more than 120 seconds.
> Aug 19 00:10:01 vm02-test kernel: [282120.240182] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 19 00:10:01 vm02-test kernel: [282120.240296] kworker/1:0     D
> ffff88032fc93900     0  3117      2 0x00000000
> Aug 19 00:10:01 vm02-test kernel: [282120.240302]  ffff8802bb4dfb30
> 0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8
> Aug 19 00:10:01 vm02-test kernel: [282120.240307]  ffff8802bb4dffd8
> ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0
> Aug 19 00:10:01 vm02-test kernel: [282120.240311]  0000000000000286
> ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
> Aug 19 00:10:01 vm02-test kernel: [282120.240316] Call Trace:
> Aug 19 00:10:01 vm02-test kernel: [282120.240334]  [<ffffffffa0570290>]
> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240340]  [<ffffffff81666c89>]
> schedule+0x29/0x70
> Aug 19 00:10:01 vm02-test kernel: [282120.240349]  [<ffffffffa057029e>]
> gfs2_glock_holder_wait+0xe/0x20 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240352]  [<ffffffff81665400>]
> __wait_on_bit+0x60/0x90
> Aug 19 00:10:01 vm02-test kernel: [282120.240361]  [<ffffffffa0570290>]
> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240364]  [<ffffffff816654ac>]
> out_of_line_wait_on_bit+0x7c/0x90
> Aug 19 00:10:01 vm02-test kernel: [282120.240369]  [<ffffffff81073400>]
> ? autoremove_wake_function+0x40/0x40
> Aug 19 00:10:01 vm02-test kernel: [282120.240378]  [<ffffffffa05713a7>]
> wait_on_holder+0x47/0x80 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240388]  [<ffffffffa05741d8>]
> gfs2_glock_nq+0x328/0x450 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240399]  [<ffffffffa058a8ca>]
> gfs2_check_blk_type+0x4a/0x150 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240410]  [<ffffffffa058a8c1>]
> ? gfs2_check_blk_type+0x41/0x150 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240421]  [<ffffffffa058ba0c>]
> gfs2_evict_inode+0x2cc/0x360 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240432]  [<ffffffffa058b842>]
> ? gfs2_evict_inode+0x102/0x360 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240437]  [<ffffffff811940c2>]
> evict+0xb2/0x1b0
> Aug 19 00:10:01 vm02-test kernel: [282120.240440]  [<ffffffff811942c9>]
> iput+0x109/0x210
> Aug 19 00:10:01 vm02-test kernel: [282120.240448]  [<ffffffffa0572fdc>]
> delete_work_func+0x5c/0x90 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240453]  [<ffffffff8106d5fa>]
> process_one_work+0x12a/0x420
> Aug 19 00:10:01 vm02-test kernel: [282120.240462]  [<ffffffffa0572f80>]
> ? gfs2_holder_uninit+0x40/0x40 [gfs2]
> Aug 19 00:10:01 vm02-test kernel: [282120.240465]  [<ffffffff8106e19e>]
> worker_thread+0x12e/0x2f0
> Aug 19 00:10:01 vm02-test kernel: [282120.240469]  [<ffffffff8106e070>]
> ? manage_workers.isra.25+0x200/0x200
> Aug 19 00:10:01 vm02-test kernel: [282120.240472]  [<ffffffff81072e73>]
> kthread+0x93/0xa0
> Aug 19 00:10:01 vm02-test kernel: [282120.240477]  [<ffffffff816710a4>]
> kernel_thread_helper+0x4/0x10
> Aug 19 00:10:01 vm02-test kernel: [282120.240480]  [<ffffffff81072de0>]
> ? flush_kthread_worker+0x80/0x80
> Aug 19 00:10:01 vm02-test kernel: [282120.240484]  [<ffffffff816710a0>]
> ? gs_change+0x13/0x13
> Aug 19 00:12:01 vm02-test kernel: [282240.240061] INFO: task
> kworker/1:0:3117 blocked for more than 120 seconds.
> Aug 19 00:12:01 vm02-test kernel: [282240.240175] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 19 00:12:01 vm02-test kernel: [282240.240289] kworker/1:0     D
> ffff88032fc93900     0  3117      2 0x00000000
> Aug 19 00:12:01 vm02-test kernel: [282240.240294]  ffff8802bb4dfb30
> 0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8
> Aug 19 00:12:01 vm02-test kernel: [282240.240299]  ffff8802bb4dffd8
> ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0
> Aug 19 00:12:01 vm02-test kernel: [282240.240304]  0000000000000286
> ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
> Aug 19 00:12:01 vm02-test kernel: [282240.240309] Call Trace:
> Aug 19 00:12:01 vm02-test kernel: [282240.240326]  [<ffffffffa0570290>]
> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240332]  [<ffffffff81666c89>]
> schedule+0x29/0x70
> Aug 19 00:12:01 vm02-test kernel: [282240.240341]  [<ffffffffa057029e>]
> gfs2_glock_holder_wait+0xe/0x20 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240345]  [<ffffffff81665400>]
> __wait_on_bit+0x60/0x90
> Aug 19 00:12:01 vm02-test kernel: [282240.240353]  [<ffffffffa0570290>]
> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240357]  [<ffffffff816654ac>]
> out_of_line_wait_on_bit+0x7c/0x90
> Aug 19 00:12:01 vm02-test kernel: [282240.240362]  [<ffffffff81073400>]
> ? autoremove_wake_function+0x40/0x40
> Aug 19 00:12:01 vm02-test kernel: [282240.240371]  [<ffffffffa05713a7>]
> wait_on_holder+0x47/0x80 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240380]  [<ffffffffa05741d8>]
> gfs2_glock_nq+0x328/0x450 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240391]  [<ffffffffa058a8ca>]
> gfs2_check_blk_type+0x4a/0x150 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240402]  [<ffffffffa058a8c1>]
> ? gfs2_check_blk_type+0x41/0x150 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240413]  [<ffffffffa058ba0c>]
> gfs2_evict_inode+0x2cc/0x360 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240424]  [<ffffffffa058b842>]
> ? gfs2_evict_inode+0x102/0x360 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240429]  [<ffffffff811940c2>]
> evict+0xb2/0x1b0
> Aug 19 00:12:01 vm02-test kernel: [282240.240432]  [<ffffffff811942c9>]
> iput+0x109/0x210
> Aug 19 00:12:01 vm02-test kernel: [282240.240440]  [<ffffffffa0572fdc>]
> delete_work_func+0x5c/0x90 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240445]  [<ffffffff8106d5fa>]
> process_one_work+0x12a/0x420
> Aug 19 00:12:01 vm02-test kernel: [282240.240454]  [<ffffffffa0572f80>]
> ? gfs2_holder_uninit+0x40/0x40 [gfs2]
> Aug 19 00:12:01 vm02-test kernel: [282240.240458]  [<ffffffff8106e19e>]
> worker_thread+0x12e/0x2f0
> Aug 19 00:12:01 vm02-test kernel: [282240.240462]  [<ffffffff8106e070>]
> ? manage_workers.isra.25+0x200/0x200
> Aug 19 00:12:01 vm02-test kernel: [282240.240465]  [<ffffffff81072e73>]
> kthread+0x93/0xa0
> Aug 19 00:12:01 vm02-test kernel: [282240.240469]  [<ffffffff816710a4>]
> kernel_thread_helper+0x4/0x10
> Aug 19 00:12:01 vm02-test kernel: [282240.240473]  [<ffffffff81072de0>]
> ? flush_kthread_worker+0x80/0x80
> Aug 19 00:12:01 vm02-test kernel: [282240.240476]  [<ffffffff816710a0>]
> ? gs_change+0x13/0x13
> <snip, goes on for a while>
> 
> Kind regards,
> 
> Bart

I usually see this when DLM is blocked. DLM usually blocks on a failed
fence action.

To clarify; this comes up on one of the three nodes only? On the node
with these messages, you shouldn't be able to look at the hung FS on the
effected node.

Can you share you versions and cluster.conf please? Also, what is in the
logs in the three or four minutes before these messages start? Anything
interesting in the log files of the other nodes around the same time period?

digimer

-- 
Digimer
Papers and Projects: https://alteeve.com




More information about the Linux-cluster mailing list