[Cluster-devel] [PATCH] dlm: remove O_NONBLOCK flag in sctp_connect_to_sock

Steven Whitehouse swhiteho at redhat.com
Wed May 30 09:23:15 UTC 2018


Hi,


On 29/05/18 04:09, Gang He wrote:
> We should remove O_NONBLOCK flag when calling sock->ops->connect()
> in sctp_connect_to_sock() function.
> Why?
> 1. up to now, sctp socket connect() function ignores the flag argument,
> that means O_NONBLOCK flag does not take effect, then we should remove
> it to avoid the confusion (but is not urgent).
> 2. for the future, there will be a patch to fix this problem, then the flag
> argument will take effect, the patch has been queued at https://git.kernel.o
> rg/pub/scm/linux/kernel/git/davem/net.git/commit/net/sctp?id=644fbdeacf1d3ed
> d366e44b8ba214de9d1dd66a9.
> But, the O_NONBLOCK flag will make sock->ops->connect() directly return
> without any wait time, then the connection will not be established, DLM kernel
> module will call sock->ops->connect() again and again, the bad results are,
> CPU usage is almost 100%, even trigger soft_lockup problem if the related
> configurations are enabled,
> DLM kernel module also prints lots of messages like,
> [Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
> [Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
> [Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
> [Fri Apr 27 11:23:43 2018] dlm: connecting to 172167592
> The upper application (e.g. ocfs2 mount command) is hanged at new_lockspace(),
> the whole backtrace is as below,
> tb0307-nd2:~ # cat /proc/2935/stack
> [<0>] new_lockspace+0x957/0xac0 [dlm]
> [<0>] dlm_new_lockspace+0xae/0x140 [dlm]
> [<0>] user_cluster_connect+0xc3/0x3a0 [ocfs2_stack_user]
> [<0>] ocfs2_cluster_connect+0x144/0x220 [ocfs2_stackglue]
> [<0>] ocfs2_dlm_init+0x215/0x440 [ocfs2]
> [<0>] ocfs2_fill_super+0xcb0/0x1290 [ocfs2]
> [<0>] mount_bdev+0x173/0x1b0
> [<0>] mount_fs+0x35/0x150
> [<0>] vfs_kern_mount.part.23+0x54/0x100
> [<0>] do_mount+0x59a/0xc40
> [<0>] SyS_mount+0x80/0xd0
> [<0>] do_syscall_64+0x76/0x140
> [<0>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [<0>] 0xffffffffffffffff
>
> So, I think we should remove O_NONBLOCK flag here, since DLM kernel module can
> not handle non-block sockect in connect() properly.
>
> Signed-off-by: Gang He <ghe at suse.com>
> ---
>   fs/dlm/lowcomms.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> index d31e9abfb9f1..a5e4a221435c 100644
> --- a/fs/dlm/lowcomms.c
> +++ b/fs/dlm/lowcomms.c
> @@ -1092,7 +1092,7 @@ static void sctp_connect_to_sock(struct connection *con)
>   	kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv,
>   			  sizeof(tv));
>   	result = sock->ops->connect(sock, (struct sockaddr *)&daddr, addr_len,
> -				   O_NONBLOCK);
> +				   0);
>   	memset(&tv, 0, sizeof(tv));
>   	kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv,
>   			  sizeof(tv));
Makes sense to me... no point setting up a timeout and then using 
O_NONBLOCK,

Steve.




More information about the Cluster-devel mailing list