[Linux-cluster] DLM nodes disconnected issue

emmanuel segura emi2fast at gmail.com
Mon Apr 7 07:44:03 UTC 2014


your fencing is working ? because i see this from your dlm lockspace "new
status    wait_messages 0 wait_condition 1 fencing".


2014-04-07 9:26 GMT+02:00 Bjoern Teipel <bjoern.teipel at internetbrands.com>:

> H all,
>
> i did a dlm_tool leave clvmd on one node (node06) of a CMAN cluster with
> CLVMD
> Now I have the problem that clvmd is stuck and all nodes lost
> connections to DLM.
> For some reason dlm want's to fence member 8 I guess and that might
> stuck the whole dlm?
> All other stacks, cman, corosync look fine...
>
> Thanks,
> Bjoern
>
> Error:
>
> dlm: closing connection to node 2
> dlm: closing connection to node 3
> dlm: closing connection to node 4
> dlm: closing connection to node 5
> dlm: closing connection to node 6
> dlm: closing connection to node 8
> dlm: closing connection to node 9
> dlm: closing connection to node 10
> dlm: closing connection to node 2
> dlm: closing connection to node 3
> dlm: closing connection to node 4
> dlm: closing connection to node 5
> dlm: closing connection to node 6
> dlm: closing connection to node 8
> dlm: closing connection to node 9
> dlm: closing connection to node 10
> INFO: task dlm_tool:33699 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> dlm_tool      D 0000000000000003     0 33699  33698 0x00000080
>  ffff88138905dcc0 0000000000000082 ffffffff81168043 ffff88138905dd18
>  ffff88138905dd08 ffff88305b30ccc0 ffff88304fa5c800 ffff883058e49900
>  ffff881857329058 ffff88138905dfd8 000000000000fb88 ffff881857329058
> Call Trace:
>  [<ffffffff81168043>] ? kmem_cache_alloc_trace+0x1a3/0x1b0
>  [<ffffffff8132f79a>] ? misc_open+0x1ca/0x320
>  [<ffffffff81510725>] rwsem_down_failed_common+0x95/0x1d0
>  [<ffffffff81185505>] ? chrdev_open+0x125/0x230
>  [<ffffffff815108b6>] rwsem_down_read_failed+0x26/0x30
>  [<ffffffff8117e5ff>] ? __dentry_open+0x23f/0x360
>  [<ffffffff81283894>] call_rwsem_down_read_failed+0x14/0x30
>  [<ffffffff8150fdb4>] ? down_read+0x24/0x30
>  [<ffffffffa06d948d>] dlm_clear_proc_locks+0x3d/0x2a0 [dlm]
>  [<ffffffff811dfed6>] ? generic_acl_chmod+0x46/0xd0
>  [<ffffffffa06e4b36>] device_close+0x66/0xc0 [dlm]
>  [<ffffffff81182b45>] __fput+0xf5/0x210
>  [<ffffffff81182c85>] fput+0x25/0x30
>  [<ffffffff8117e0dd>] filp_close+0x5d/0x90
>  [<ffffffff8117e1b5>] sys_close+0xa5/0x100
>  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>
>
>
> Status:
>
> cman_tool nodes
> Node  Sts   Inc   Joined               Name
>    1   M  18908   2014-03-24 19:01:00  node01
>    2   M  18972   2014-04-06 22:47:57  node02
>    3   M  18972   2014-04-06 22:47:57  node03
>    4   M  18972   2014-04-06 22:47:57  node04
>    5   M  18972   2014-04-06 22:47:57  node05
>    6   X  18960                        node06
>    7   X  18928                        node07
>    8   M  18972   2014-04-06 22:47:57  node08
>    9   M  18972   2014-04-06 22:47:57  node09
>   10   M  18972   2014-04-06 22:47:57  node10
>
> dlm lockspaces
> name          clvmd
> id            0x4104eefa
> flags         0x00000004 kern_stop
> change        member 8 joined 0 remove 1 failed 0 seq 11,11
> members       1 2 3 4 5 8 9 10
> new change    member 8 joined 1 remove 0 failed 0 seq 12,41
> new status    wait_messages 0 wait_condition 1 fencing
> new members   1 2 3 4 5 8 9 10
>
>
>
> DLM dump:
> 1396849677 cluster node 2 added seq 18972
> 1396849677 set_configfs_node 2 10.14.18.66 local 0
> 1396849677 cluster node 3 added seq 18972
> 1396849677 set_configfs_node 3 10.14.18.67 local 0
> 1396849677 cluster node 4 added seq 18972
> 1396849677 set_configfs_node 4 10.14.18.68 local 0
> 1396849677 cluster node 5 added seq 18972
> 1396849677 set_configfs_node 5 10.14.18.70 local 0
> 1396849677 cluster node 8 added seq 18972
> 1396849677 set_configfs_node 8 10.14.18.80 local 0
> 1396849677 cluster node 9 added seq 18972
> 1396849677 set_configfs_node 9 10.14.18.81 local 0
> 1396849677 cluster node 10 added seq 18972
> 1396849677 set_configfs_node 10 10.14.18.77 local 0
> 1396849677 dlm:ls:clvmd conf 2 1 0 memb 1 3 join 3 left
> 1396849677 clvmd add_change cg 35 joined nodeid 3
> 1396849677 clvmd add_change cg 35 counts member 2 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 3 1 0 memb 1 2 3 join 2 left
> 1396849677 clvmd add_change cg 36 joined nodeid 2
> 1396849677 clvmd add_change cg 36 counts member 3 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 4 1 0 memb 1 2 3 9 join 9 left
> 1396849677 clvmd add_change cg 37 joined nodeid 9
> 1396849677 clvmd add_change cg 37 counts member 4 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 5 1 0 memb 1 2 3 8 9 join 8 left
> 1396849677 clvmd add_change cg 38 joined nodeid 8
> 1396849677 clvmd add_change cg 38 counts member 5 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 6 1 0 memb 1 2 3 8 9 10 join 10 left
> 1396849677 clvmd add_change cg 39 joined nodeid 10
> 1396849677 clvmd add_change cg 39 counts member 6 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left
> 1396849677 clvmd add_change cg 40 joined nodeid 5
> 1396849677 clvmd add_change cg 40 counts member 7 joined 1 remove 0 failed
> 0
> 1396849677 dlm:ls:clvmd conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left
> 1396849677 clvmd add_change cg 41 joined nodeid 4
> 1396849677 clvmd add_change cg 41 counts member 8 joined 1 remove 0 failed
> 0
> 1396849677 dlm:controld conf 2 1 0 memb 1 3 join 3 left
> 1396849677 dlm:controld conf 3 1 0 memb 1 2 3 join 2 left
> 1396849677 dlm:controld conf 4 1 0 memb 1 2 3 9 join 9 left
> 1396849677 dlm:controld conf 5 1 0 memb 1 2 3 8 9 join 8 left
> 1396849677 dlm:controld conf 6 1 0 memb 1 2 3 8 9 10 join 10 left
> 1396849677 dlm:controld conf 7 1 0 memb 1 2 3 5 8 9 10 join 5 left
> 1396849677 dlm:controld conf 8 1 0 memb 1 2 3 4 5 8 9 10 join 4 left
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140407/7c1a9906/attachment.htm>


More information about the Linux-cluster mailing list