[Linux-cluster] gfs2 blocking tasks
Bart Verwilst
lists at verwilst.be
Sun Aug 19 21:34:45 UTC 2012
Well nothing strange there either i think, nothing on for example
vm01-test during that period, vm02-test gives me:
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 1760
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply 40000007 to fd 17
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 800000b7
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 800000b7
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 0
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 0
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply c00000b7 to fd 17
Aug 19 00:09:57 vm02-test corosync[7394]: [CMAN ] memb:
quorum_device_timer_fn
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 7
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 7
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: retlen = 1760
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 4
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 1760
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply 40000007 to fd 17
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 800000b7
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 800000b7
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 0
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 0
Aug 19 00:09:58 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply c00000b7 to fd 17
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 7
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 7
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: retlen = 1760
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 4
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 1760
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply 40000007 to fd 17
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 800000b7
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 800000b7
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 0
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 0
Aug 19 00:09:59 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply c00000b7 to fd 17
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 7
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 7
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: retlen = 1760
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 4
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 1760
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply 40000007 to fd 17
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 800000b7
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 800000b7
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 0
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 0
Aug 19 00:10:00 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply c00000b7 to fd 17
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 7
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 7
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: retlen = 1760
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 4
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 1760
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply 40000007 to fd 17
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 800000b7
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 800000b7
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 0
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 0
Aug 19 00:10:01 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply c00000b7 to fd 17
Aug 19 00:10:01 vm02-test kernel: [282120.240067] INFO: task
kworker/1:0:3117 blocked for more than 120 seconds.
Aug 19 00:10:01 vm02-test kernel: [282120.240182] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 19 00:10:01 vm02-test kernel: [282120.240296] kworker/1:0 D
ffff88032fc93900 0 3117 2 0x00000000
Aug 19 00:10:01 vm02-test kernel: [282120.240302] ffff8802bb4dfb30
0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8
Aug 19 00:10:01 vm02-test kernel: [282120.240307] ffff8802bb4dffd8
ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0
Aug 19 00:10:01 vm02-test kernel: [282120.240311] 0000000000000286
ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
Aug 19 00:10:01 vm02-test kernel: [282120.240316] Call Trace:
Aug 19 00:10:01 vm02-test kernel: [282120.240334] [<ffffffffa0570290>]
? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240340] [<ffffffff81666c89>]
schedule+0x29/0x70
Aug 19 00:10:01 vm02-test kernel: [282120.240349] [<ffffffffa057029e>]
gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240352] [<ffffffff81665400>]
__wait_on_bit+0x60/0x90
Aug 19 00:10:01 vm02-test kernel: [282120.240361] [<ffffffffa0570290>]
? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240364] [<ffffffff816654ac>]
out_of_line_wait_on_bit+0x7c/0x90
Aug 19 00:10:01 vm02-test kernel: [282120.240369] [<ffffffff81073400>]
? autoremove_wake_function+0x40/0x40
Aug 19 00:10:01 vm02-test kernel: [282120.240378] [<ffffffffa05713a7>]
wait_on_holder+0x47/0x80 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240388] [<ffffffffa05741d8>]
gfs2_glock_nq+0x328/0x450 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240399] [<ffffffffa058a8ca>]
gfs2_check_blk_type+0x4a/0x150 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240410] [<ffffffffa058a8c1>]
? gfs2_check_blk_type+0x41/0x150 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240421] [<ffffffffa058ba0c>]
gfs2_evict_inode+0x2cc/0x360 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240432] [<ffffffffa058b842>]
? gfs2_evict_inode+0x102/0x360 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240437] [<ffffffff811940c2>]
evict+0xb2/0x1b0
Aug 19 00:10:01 vm02-test kernel: [282120.240440] [<ffffffff811942c9>]
iput+0x109/0x210
Aug 19 00:10:01 vm02-test kernel: [282120.240448] [<ffffffffa0572fdc>]
delete_work_func+0x5c/0x90 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240453] [<ffffffff8106d5fa>]
process_one_work+0x12a/0x420
Aug 19 00:10:01 vm02-test kernel: [282120.240462] [<ffffffffa0572f80>]
? gfs2_holder_uninit+0x40/0x40 [gfs2]
Aug 19 00:10:01 vm02-test kernel: [282120.240465] [<ffffffff8106e19e>]
worker_thread+0x12e/0x2f0
Aug 19 00:10:01 vm02-test kernel: [282120.240469] [<ffffffff8106e070>]
? manage_workers.isra.25+0x200/0x200
Aug 19 00:10:01 vm02-test kernel: [282120.240472] [<ffffffff81072e73>]
kthread+0x93/0xa0
Aug 19 00:10:01 vm02-test kernel: [282120.240477] [<ffffffff816710a4>]
kernel_thread_helper+0x4/0x10
Aug 19 00:10:01 vm02-test kernel: [282120.240480] [<ffffffff81072de0>]
? flush_kthread_worker+0x80/0x80
Aug 19 00:10:01 vm02-test kernel: [282120.240484] [<ffffffff816710a0>]
? gs_change+0x13/0x13
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 7
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 7
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: retlen = 1760
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 4
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 1760
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply 40000007 to fd 17
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 800000b7
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 800000b7
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 0
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 0
Aug 19 00:10:02 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply c00000b7 to fd 17
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: client
command is 7
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: About to
process command
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] memb: command to
process is 7
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: allocated new buffer (retsize=1024)
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] memb:
get_all_members: retlen = 1760
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] memb: command
return code is 4
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: Returning
command data. length = 1760
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: sending
reply 40000007 to fd 17
Aug 19 00:10:03 vm02-test corosync[7394]: [CMAN ] daemon: read 20
bytes from fd 17
Does this help in any way?
Kind regards,
Bart
Digimer schreef op 19.08.2012 22:24:
> The config looks fine. Sorry, I should have specified syslog;
> /var/log/messages.
>
> On 08/19/2012 03:05 PM, Bart Verwilst wrote:
>> <cluster name="kvm" config_version="13">
>> <logging debug="on"/>
>> <clusternodes>
>> <clusternode name="vm01-test" nodeid="1">
>> <fence>
>> <method name="apc">
>> <device name="apc01" port="1" action="off"/>
>> <device name="apc02" port="1" action="off"/>
>> <device name="apc01" port="1" action="on"/>
>> <device name="apc02" port="1" action="on"/>
>> </method>
>> </fence>
>> </clusternode>
>> <clusternode name="vm02-test" nodeid="2">
>> <fence>
>> <method name="apc">
>> <device name="apc01" port="8" action="off"/>
>> <device name="apc02" port="8" action="off"/>
>> <device name="apc01" port="8" action="on"/>
>> <device name="apc02" port="8" action="on"/>
>> </method>
>> </fence>
>> </clusternode>
>> <clusternode name="vm03-test" nodeid="3">
>> <fence>
>> <method name="apc">
>> <device name="apc01" port="2" action="off"/>
>> <device name="apc02" port="2" action="off"/>
>> <device name="apc01" port="2" action="on"/>
>> <device name="apc02" port="2" action="on"/>
>> </method>
>> </fence>
>> </clusternode>
>> </clusternodes>
>> <fencedevices>
>> <fencedevice agent="fence_apc" ipaddr="apc01" secure="on"
>> login="device" name="apc01" passwd="xxx"/>
>> <fencedevice agent="fence_apc" ipaddr="apc02" secure="on"
>> login="device" name="apc02" passwd="xxx"/>
>> </fencedevices>
>> <rm log_level="5">
>> <failoverdomains>
>> <failoverdomain name="any_node" nofailback="1"
>> ordered="0"
>> restricted="0"/>
>> </failoverdomains>
>> <vm domain="any_node" max_restarts="2" migrate="live"
>> name="firewall" path="/etc/libvirt/qemu/" recovery="restart"
>> restart_expire_time="600"/>
>> <vm domain="any_node" max_restarts="2" migrate="live"
>> name="zabbix" path="/etc/libvirt/qemu/" recovery="restart"
>> restart_expire_time="600"/>
>> </rm>
>> <totem rrp_mode="none" secauth="off"/>
>> <quorumd device="/dev/mapper/iscsi_cluster_quorum"></quorumd>
>> </cluster>
>>
>> dlm_control.log*, fenced.log, corosync.log are either empty or just
>> contain entries until Aug 15 or ( in case of fenced.log ) "Aug 18
>> 23:41:46 fenced logging mode 3 syslog f 160 p 6 logfile p 7
>> /var/log/cluster/fenced.log". I was expecting at least some log
>> output,
>> since everything seems to work fine. Maybe enabling more debugging
>> might
>> bring something up ( but debugging is already on @ cluster.conf )
>>
>> Kind regards,
>>
>> Bart
>>
>> Digimer schreef op 19.08.2012 17:45:
>>> On 08/19/2012 05:52 AM, Bart Verwilst wrote:
>>>> Hi,
>>>>
>>>> I have a 3-node cluster in testing which seem to work quite well (
>>>> cman,
>>>> rgmanager, gfs2, etc ).
>>>> On (only) one of my nodes, yesterday i noticed the message below
>>>> in
>>>> dmesg.
>>>>
>>>> I saw this 30 minutes after the facts. I could browse both my gfs2
>>>> mounts, there was no fencing or anything on any node.
>>>>
>>>> Any idea what might have caused this, and then go away?
>>>>
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240067] INFO: task
>>>> kworker/1:0:3117 blocked for more than 120 seconds.
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240182] "echo 0 >
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240296] kworker/1:0
>>>> D
>>>> ffff88032fc93900 0 3117 2 0x00000000
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240302]
>>>> ffff8802bb4dfb30
>>>> 0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240307]
>>>> ffff8802bb4dffd8
>>>> ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240311]
>>>> 0000000000000286
>>>> ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240316] Call Trace:
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240334]
>>>> [<ffffffffa0570290>]
>>>> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240340]
>>>> [<ffffffff81666c89>]
>>>> schedule+0x29/0x70
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240349]
>>>> [<ffffffffa057029e>]
>>>> gfs2_glock_holder_wait+0xe/0x20 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240352]
>>>> [<ffffffff81665400>]
>>>> __wait_on_bit+0x60/0x90
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240361]
>>>> [<ffffffffa0570290>]
>>>> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240364]
>>>> [<ffffffff816654ac>]
>>>> out_of_line_wait_on_bit+0x7c/0x90
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240369]
>>>> [<ffffffff81073400>]
>>>> ? autoremove_wake_function+0x40/0x40
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240378]
>>>> [<ffffffffa05713a7>]
>>>> wait_on_holder+0x47/0x80 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240388]
>>>> [<ffffffffa05741d8>]
>>>> gfs2_glock_nq+0x328/0x450 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240399]
>>>> [<ffffffffa058a8ca>]
>>>> gfs2_check_blk_type+0x4a/0x150 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240410]
>>>> [<ffffffffa058a8c1>]
>>>> ? gfs2_check_blk_type+0x41/0x150 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240421]
>>>> [<ffffffffa058ba0c>]
>>>> gfs2_evict_inode+0x2cc/0x360 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240432]
>>>> [<ffffffffa058b842>]
>>>> ? gfs2_evict_inode+0x102/0x360 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240437]
>>>> [<ffffffff811940c2>]
>>>> evict+0xb2/0x1b0
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240440]
>>>> [<ffffffff811942c9>]
>>>> iput+0x109/0x210
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240448]
>>>> [<ffffffffa0572fdc>]
>>>> delete_work_func+0x5c/0x90 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240453]
>>>> [<ffffffff8106d5fa>]
>>>> process_one_work+0x12a/0x420
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240462]
>>>> [<ffffffffa0572f80>]
>>>> ? gfs2_holder_uninit+0x40/0x40 [gfs2]
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240465]
>>>> [<ffffffff8106e19e>]
>>>> worker_thread+0x12e/0x2f0
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240469]
>>>> [<ffffffff8106e070>]
>>>> ? manage_workers.isra.25+0x200/0x200
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240472]
>>>> [<ffffffff81072e73>]
>>>> kthread+0x93/0xa0
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240477]
>>>> [<ffffffff816710a4>]
>>>> kernel_thread_helper+0x4/0x10
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240480]
>>>> [<ffffffff81072de0>]
>>>> ? flush_kthread_worker+0x80/0x80
>>>> Aug 19 00:10:01 vm02-test kernel: [282120.240484]
>>>> [<ffffffff816710a0>]
>>>> ? gs_change+0x13/0x13
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240061] INFO: task
>>>> kworker/1:0:3117 blocked for more than 120 seconds.
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240175] "echo 0 >
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240289] kworker/1:0
>>>> D
>>>> ffff88032fc93900 0 3117 2 0x00000000
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240294]
>>>> ffff8802bb4dfb30
>>>> 0000000000000046 ffff88031e4744d0 ffff8802bb4dffd8
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240299]
>>>> ffff8802bb4dffd8
>>>> ffff8802bb4dffd8 ffff88031e5796f0 ffff88031e4744d0
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240304]
>>>> 0000000000000286
>>>> ffff88032ffbd0f8 ffff8802bb4dfbc8 0000000000000002
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240309] Call Trace:
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240326]
>>>> [<ffffffffa0570290>]
>>>> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240332]
>>>> [<ffffffff81666c89>]
>>>> schedule+0x29/0x70
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240341]
>>>> [<ffffffffa057029e>]
>>>> gfs2_glock_holder_wait+0xe/0x20 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240345]
>>>> [<ffffffff81665400>]
>>>> __wait_on_bit+0x60/0x90
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240353]
>>>> [<ffffffffa0570290>]
>>>> ? gfs2_glock_demote_wait+0x20/0x20 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240357]
>>>> [<ffffffff816654ac>]
>>>> out_of_line_wait_on_bit+0x7c/0x90
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240362]
>>>> [<ffffffff81073400>]
>>>> ? autoremove_wake_function+0x40/0x40
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240371]
>>>> [<ffffffffa05713a7>]
>>>> wait_on_holder+0x47/0x80 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240380]
>>>> [<ffffffffa05741d8>]
>>>> gfs2_glock_nq+0x328/0x450 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240391]
>>>> [<ffffffffa058a8ca>]
>>>> gfs2_check_blk_type+0x4a/0x150 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240402]
>>>> [<ffffffffa058a8c1>]
>>>> ? gfs2_check_blk_type+0x41/0x150 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240413]
>>>> [<ffffffffa058ba0c>]
>>>> gfs2_evict_inode+0x2cc/0x360 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240424]
>>>> [<ffffffffa058b842>]
>>>> ? gfs2_evict_inode+0x102/0x360 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240429]
>>>> [<ffffffff811940c2>]
>>>> evict+0xb2/0x1b0
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240432]
>>>> [<ffffffff811942c9>]
>>>> iput+0x109/0x210
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240440]
>>>> [<ffffffffa0572fdc>]
>>>> delete_work_func+0x5c/0x90 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240445]
>>>> [<ffffffff8106d5fa>]
>>>> process_one_work+0x12a/0x420
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240454]
>>>> [<ffffffffa0572f80>]
>>>> ? gfs2_holder_uninit+0x40/0x40 [gfs2]
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240458]
>>>> [<ffffffff8106e19e>]
>>>> worker_thread+0x12e/0x2f0
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240462]
>>>> [<ffffffff8106e070>]
>>>> ? manage_workers.isra.25+0x200/0x200
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240465]
>>>> [<ffffffff81072e73>]
>>>> kthread+0x93/0xa0
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240469]
>>>> [<ffffffff816710a4>]
>>>> kernel_thread_helper+0x4/0x10
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240473]
>>>> [<ffffffff81072de0>]
>>>> ? flush_kthread_worker+0x80/0x80
>>>> Aug 19 00:12:01 vm02-test kernel: [282240.240476]
>>>> [<ffffffff816710a0>]
>>>> ? gs_change+0x13/0x13
>>>> <snip, goes on for a while>
>>>>
>>>> Kind regards,
>>>>
>>>> Bart
>>>
>>> I usually see this when DLM is blocked. DLM usually blocks on a
>>> failed
>>> fence action.
>>>
>>> To clarify; this comes up on one of the three nodes only? On the
>>> node
>>> with these messages, you shouldn't be able to look at the hung FS
>>> on the
>>> effected node.
>>>
>>> Can you share you versions and cluster.conf please? Also, what is
>>> in the
>>> logs in the three or four minutes before these messages start?
>>> Anything
>>> interesting in the log files of the other nodes around the same
>>> time
>>> period?
>>>
>>> digimer
>>
More information about the Linux-cluster
mailing list