[Cluster-devel] [PATCH] dlm: schedule during recovery loops

Patrick Caulfield pcaulfie at redhat.com
Wed Sep 26 13:52:23 UTC 2007


David Teigland wrote:
> On Wed, Sep 26, 2007 at 08:18:55AM +0100, Patrick Caulfield wrote:
>> David Teigland wrote:
>>> Call schedule() in a bunch of places where the recovery code loops
>>> through lists of locks.  The theory is that these lists become so
>>> long that looping through them triggers the softlockup watchdog.
>>> (usually on ia64, doesn't seem to happen often on other arch's).
>>>
>>> Signed-off-by: David Teigland <teigland at redhat.com>
>>
>> I think we're encouraged to use cond_resched() instead these days. It has the
>> same effect but doesn't force a schedule if there is nothing else to run.
> 
> OK, I'd like to try to do cond_resched() instead, how certain are we that
> it's just as effective in avoiding the softlockup watchdog?  Testing it is
> going to be difficult since it's largely unreproducable outside of some
> single cpu ia64 machines in the qe dept...


I can't see it making any real difference. If there is nothing to schedule then
the process will continue. With cond_resched() it continues cheaply, with
schedule() it will re-enter the scheduler and /then/ get rescheduled. If
anything it will help because there's less time spent in the schedule I suspect
(though I doubt it's measurable)

-- 
Patrick




More information about the Cluster-devel mailing list