ptrace cleanup tasks

Tue Apr 14 02:58:20 UTC 2009

> I tried to think about the first steps in ptrace-cleanup, and I
> need your help.

I think I said that list was "not necessarily in this order" and I
certainly meant it so.  I hope you haven't been slowed in proceeding on any
of the pieces that are more straightforward and can be done mostly
orthogonal to the sticky ones.  The locking change is the real hairy one
that I expected we'd have to spend a while hashing out.

> How should it nest with tasklist_lock? I don't think we should
> take ->ptrace_lock under tasklist. Instead, tasklist_lock should
> nest inside ->ptrace_lock, this means it could be ->ptrace_mutex.
> Otherwise, for example, it is not clear how can exit_ptrace()
> change tracee->exit_state.

We want the new lock to be a mutex.  I don't think we want to hold both
locks, though it won't hurt to have tasklist_lock taken (appropriately
briefly) while holding the mutex.

> But, looking at do_wait(), I can't understand what can we do right
> now. Looks like we we have to move ptrace_do_wait() to another
> loop. But this doesn't look as a cleanup, this will complicate
> the code even more.

I don't think we have to do that.  But I want to point out here that the
big-picture cleanup we are getting here is having ptrace machinations never
take tasklist_lock.  That is not something you can see as an immediate good
by reading diffstats or whatever such micro-level view of "cleanup".  But
with a larger view, it is a huge boon for the desireable characteristics in
the kernel overall, in performance scaling, proper separation of concerns,
etc.  So this large good is what to consider against the smaller costs of
some repetition or complexity in the code at the micro-level.  (Of course,
we always want to make it as clean and concise as we can at every level.)

> Perhaps, we can simplify things if we add ->ptrace_mutex to
> signal_struct, not to task_struct. (actually, I think ->ptraced
> and ->children should go to signal_struct too). [...]

This opens large new cans of worms that I don't think we want to get into.
(Moving ->ptraced implies a change to ptrace semantics that is its own big
can of worms.  Moving ->children is a wholly non-ptrace issue that we can
consider quite separately and we should not conflate it with this work.)

> Thoughts?

The tasklist_lock for our purposes here is only taken to keep the
next_thread() loop valid.  (For non-ptrace do_wait_thread() it serves other
purposes too, but leave that aside for now.)  In ptrace_do_wait(), we just
need to make sure that loop (in its caller) does not go awry.  What I think
we can do is some version of:

	get_task_struct(tsk);
	read_unlock(&tasklist_lock);
	mutex_lock(&tsk->ptrace_mutex);
	...
	mutex_unlock(&tsk->ptrace_mutex);
	read_lock(&tasklist_lock);
	... check for tsk->thread_group having been invalidated ...
	put_task_struct(tsk);
	...

There are a few angles to cover in this approach.  I haven't thought these
all through, but I suspect we can iron them all out.

1. We really want to short-circuit and not do any lock fiddling in the
   common case of list_empty(&tsk->ptraced).  It's an unlocked racy check
   since only the mutex will lock that list.  But it is probably not too
   hard to work out a way to do it where we are confident that any
   false-positive case (i.e. we read as empty when racing with tsk or its
   traceme child adding to the list) is tolerable.  That is, that any time
   we short-circuited, we can be sure a wait_chldexit wakeup will follow to
   get us another iteration if it matters.  (Or that it just doesn't
   matter, because the attach/traceme can be said to be "after" the wait
   call for this race.)

2. After retaking tasklist_lock, we need to detect when release_task(tsk)
   has made next_thread(tsk) potentially invalid.  This should be the rare
   race case, so it needs to be robust, but not real fast nor rule out all
   the false-positives (where next_thread() would really have been fine).
   Probably good enough to do:

	if (!ret) {
		read_lock(&tasklist_lock);
		if (unlikely(tsk->exit_state == EXIT_DEAD)) {
			read_unlock(&tasklist_lock);
			ret = -ERESTARTNOINTR;
			set_thread_flag(TIF_SIGPENDING);
	}
	put_task_struct(tsk);
	return ret;

   Given that it dying would early get to ptrace_exit() and take the mutex
   there, this might even be somehow moot.  That is, it couldn't have
   exited while we held the mutex, so maybe:

	read_lock(&tasklist_lock);
	mutex_unlock(&tsk->ptrace_mutex);

   just does it, if that pattern doesn't freak out lockdep or anything.
   (Probably it's not really that simple.)

3. I thought there was at least a third one, but I'm not seeing it now.
   You'll probably find more holes in the scheme. :-)

Thanks,
Roland