[Cluster-devel] [DLM PATCH] dlm: Don't swamp the CPU with callbacks queued during recovery

Fri Nov 9 10:14:49 UTC 2018

Hi,

On 08/11/18 19:04, Bob Peterson wrote:
> Hi,
>
> Before this patch, recovery would cause all callbacks to be delayed,
> put on a queue, and afterward they were all queued to the callback
> work queue. This patch does the same thing, but occasionally takes
> a break after 25 of them so it won't swamp the CPU at the expense
> of other RT processes like corosync.
>
> Signed-off-by: Bob Peterson <rpeterso at redhat.com>
> ---
>   fs/dlm/ast.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
> index 562fa8c3edff..47ee66d70109 100644
> --- a/fs/dlm/ast.c
> +++ b/fs/dlm/ast.c
> @@ -292,6 +292,8 @@ void dlm_callback_suspend(struct dlm_ls *ls)
>   		flush_workqueue(ls->ls_callback_wq);
>   }
>   
> +#define MAX_CB_QUEUE 25
> +
>   void dlm_callback_resume(struct dlm_ls *ls)
>   {
>   	struct dlm_lkb *lkb, *safe;
> @@ -302,15 +304,23 @@ void dlm_callback_resume(struct dlm_ls *ls)
>   	if (!ls->ls_callback_wq)
>   		return;
>   
> +more:
>   	mutex_lock(&ls->ls_cb_mutex);
>   	list_for_each_entry_safe(lkb, safe, &ls->ls_cb_delay, lkb_cb_list) {
>   		list_del_init(&lkb->lkb_cb_list);
>   		queue_work(ls->ls_callback_wq, &lkb->lkb_cb_work);
>   		count++;
> +		if (count == MAX_CB_QUEUE)
> +			break;
>   	}
>   	mutex_unlock(&ls->ls_cb_mutex);
>   
>   	if (count)
>   		log_rinfo(ls, "dlm_callback_resume %d", count);
> +	if (count == MAX_CB_QUEUE) {
> +		count = 0;
> +		cond_resched();
> +		goto more;
> +	}
>   }
>   
>

While that is a good thing to do, it looks like the real culprit here 
might be elsewhere. Look at what this is doing... adding a large number 
of work items under the ls_cb_mutex, and then look at what the work item 
does... first thing is to lock the lkb_cb_mutex, so if we have a 
multi-core system then this is creating a large number of work items all 
of which will be fighting each other (and the thread that is trying to 
add new items) for the lock so no wonder it doesn't work efficiently.

If we called the callbacks directly here, then we would avoid all that 
fighting for the mutex and also remove the need for scheduling the work 
item in the first place too. That should greatly decrease the amount of 
cpu time required and reduce latency and contention on the mutex,

Steve.