[dm-devel] workqueues and percpu (was: [PATCH] dm: remake of the verity target)

Fri Mar 9 21:15:12 UTC 2012

Tejun Heo (tj at kernel.org) wrote:
> Hello,
> 
> On Thu, Mar 08, 2012 at 02:39:09PM -0800, Andrew Morton wrote:
> > > I looked at it --- and using percpu variables in workqueues isn't safe 
> > > because the workqueue can change CPU if the CPU is hot-unplugged.
> 
> Generally, I don't think removing preemption enable/disable around
> percpu variable access is a worthwhile optimization unles it's on
> really really really hot path.  We'll eventually add debug annotations
> to percpu accessors and the ones used outside proper preemption
> protections would need to be updated anyway.
> 

In this case, I need the per-cpu data for the duration of calculating
a cryptographics hash on a 4K page of data. That's a long time to disable
pre-emption.

I could fix the bug temporarily by adding get/put for the per_cpu data
but would that be acceptable? I'm not sure what the OK limit is for how
long one can disable preemption. An alternative fix would be not allow
CONFIG_VERITY when CONFIG_HOTPLUG_CPU. Once workqueues are fixed, I could
remove that restriction.

Thoughts?

> > > dm-crypt has the same bug --- it also uses workqueue with per-cpu 
> > > variables and assumes that the CPU doesn't change for a single work item.
> > > 
> > > This program shows that work executed in a workqueue can be switched to a 
> > > different CPU.
> > > 
> > > I'm wondering how much other kernel code assumes that workqueues are bound 
> > > to a specific CPU, which isn't true if we unplug that CPU.
> > 
> > ugh.
> > 
> > We really don't want to have to avoid using workqueues because of some
> > daft issue with CPU hot-unplug.
> 
> Using or not using wq is orthogonal tho.  Using kthreads directly
> requires hooking into CPU hotplug callbacks and one might as well call
> flush_work_sync() from there instead of shutting down kthread.
> 
> > And yes, there are assumptions in various work handlers that they
> > will be pinned to a single CPU.  Finding and fixing those
> > assumptions would be painful.
> >
> > Heck, even debug_smp_processor_id() can be wrong in the presence of the
> > cpu-unplug thing.
> 
> Yeah, that's a generic problem with cpu unplug.
> 
> > I'm not sure what we can do about it really, apart from blocking unplug
> > until all the target CPU's workqueues have been cleared.  And/or refusing
> > to unplug a CPU until all pinned-to-that-cpu kernel threads have been
> > shut down or pinned elsewhere (which is the same thing, only more
> > general).
> > 
> > Tejun, is this new behaviour?  I do recall that a long time ago we
> > wrestled with unplug-vs-worker-threads and I ended up OK with the
> > result, but I forget what it was.  IIRC Rusty was involved.
> 
> Unfortunately, yes, this is a new behavior.  Before, we could have
> unbound delays during unplug from work items.  Now, we have CPU
> affinity assumption breakage.  The behavior change was primarily to
> allow long running work items to use regular workqueues without
> worrying about inducing delay across cpu hotplug operations, which is
> important as it's also used on suspend / hibernation, especially on
> mobile platforms.
> 
> During the cmwq conversion, I ended up auditing a lot of (I think I
> went through most of them) workqueue users and IIRC there weren't too
> many which required stable affinity.
> 
> > That being said, I don't think it's worth compromising the DM code
> > because of this workqueue wart: lots of other code has the same wart,
> > and we should find a centralised fix for it.
> 
> Probably the best way to solve this is introducing pinned attribute to
> workqueues and have them drained automatically on cpu hotplug events.
> It'll require auditing workqueue users but I guess we'll just have to
> do it given that we need to actually distinguish the ones need to be
> pinned.  Or maybe we can use explicit queue_work_on() to distinguish
> the ones which require pinning.
> 
> Another approach would be requiring all workqueues to be drained on
> cpu offlining and requiring any work item which may stall to use
> unbound wq.  IMHO, picking out the ones which may stall would be much
> less obvious than the ones which require cpu pinning.
> 
> Better ideas?
> 
> Thanks.
> 
> -- 
> tejun