Q: utrace->stopped && utrace_report_jctl()

Roland McGrath roland at redhat.com
Thu Mar 12 07:36:52 UTC 2009


> I'd like to ask you to clarify what utrace->stopped means...

I'm very glad you are looking into this area!

> My understanding is: if we see ->stopped == true under utrace->lock, then
> the target can do nothing "interesting" from the utrace's pov. The target
> should take utrace->lock at least once. Either in finish_utrace_stop(), or,
> if ->stopped was set by do_signal_stop() path, the target will call
> tracehook_get_signal()->utrace_get_signal(). So we can assume the target
> is "quiescent" and we can do, for example, UTRACE_DETACH safely.

Correct.

> But utrace_report_jctl() doesn't look right to me,
> 
> 	spin_lock(&utrace->lock);
> 	utrace->stopped = 0;
> 	utrace->report = 0;
> 	spin_unlock(&utrace->lock);
> 
> I must admit, I dont't understand the comment above, but obviously this is
> right, we should clear ->stopped. If nothing else, REPORT()->start_report()
> won't be happy if ->stopped.

The comment mentions "utrace being removed", which is a bit of old text
referring to an indirect struct utrace.  Aside from that, please tell me
what is not clear about that comment.

> But ->stopped can be restored right after we clear it! Yes, utrace_do_stop()
> and utrace_set_events() set ->stopped == 1 only if ->utrace_flags has no JCTL,
> and since we are here we must have JCTL.

That's indeed the logic intended to prevent ->stopped being set again here.

> But, if we enter utrace_report_jctl() with ->stopped == 1, JCTL can be
> already removed from ->utrace_flags, exactly because ->stopped was true.

I don't follow this.  JCTL is never "removed" from ->utrace_flags, except
as all event bits are, by utrace_reset().

> This leads to another minor question, how it is possible to enter enter
> utrace_report_jctl() with ->stopped == 1 ? I think the only possibility
> it was previously set by another call to utrace_report_jctl(), see below.

There are two ways to enter utrace_report_jctl with ->stopped set.

1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then.
   Now utrace_report_jctl is called for the CLD_CONTINUED case, and
   ->stopped remains set.

2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was
   already in TASK_STOPPED (and really stopped, or at least got past
   tracehook_notify_jctl before JCTL was set).  It sets ->stopped before
   adding JCTL to ->utrace_flags, so that utrace_control() will consider
   the target stopped.

> SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in
> progress and it is not finished yet. 

SIGNAL_STOP_STOPPED should be reliable, as far as it goes.  It will only be
set if the group stop is complete.  If then a SIGCONT+stop signal come,
SIGCONT will clear SIGNAL_STOP_STOPPED before the stop signal starts
another group stop.  (We have no bad old PTRACE_CONT implementation to
conflict with here.)

> But ->group_stop_count is not reliable too. It it possible that we
> recieved SIGCONT and then another SIGSTOP. If another thread has already
> dequeued this SIGSTOP and initiated the new group stop, we can't just set
> TASK_STOPPED, we must participate in the ->group_stop_count accounting.

It's worse than that!  If we came out of TASK_STOPPED, we did it implicitly
and without holding the siglock.  

We participated in group_stop_count accounting for the first stop before we
got here.  If we stayed in TASK_STOPPED throughout the callbacks, then that
bookkeeping is still correct.

If the initiation of the new group stop happened while we were in
TASK_STOPPED, we were omitted from the count but we should stop again.  
In that case we should stop either if SIGNAL_STOP_STOPPED is set or if
group_stop_count > 0.  Since we weren't counted, if group_stop_count==0
then SIGNAL_STOP_STOPPED will be set (again).

If that initiation happened while a callback (e.g.) blocked in kmalloc or
after (i.e. we were not in TASK_STOPPED), we were included in that count.
In that case we need to decrement group_stop_count and stop again, but
possibly also need to call do_notify_parent_cldstop again if it was 1.  For
that we'd do the right thing just by returning in TASK_RUNNING.  We'll just
come right back around in get_signal_to_deliver and handle group_stop_count
normally.

The trouble is that we have no way to distinguish these two cases, i.e. to
know whether or not we were counted in group_stop_count.  Am I missing a
way?  (The one piece of information we are not using is the @notify
argument: it tells us whether we were the thread responsible for setting
SIGNAL_STOP_STOPPED just before we got here.  But I don't see how that helps.)

I think the bottom line is that we can't ever allow any transition to or
from TASK_STOPPED when we don't hold the siglock.  Every such transition
must hold that lock to manage group_stop_count and SIGNAL_STOP_STOPPED.

That suggests we must preemptively go back to TASK_RUNNING before making
the callbacks, just in case they would do the transition.  We'd take the
siglock and manage the bookkeeping.  But I'm not sure yet how best to do
that.  I'm not sure if we can safely clear SIGNAL_STOP_STOPPED momentarily
after it's been set.  This all happens before do_notify_parent_cldstop is
called, which avoids a whole can of worms about do_wait() I was starting to
worry about.  Hmm.  Seems like there should be something we can do using
group_stop_count and/or checking the SIGNAL_CLD_* bits to notice a SIGCONT
having come in.

> 
> 	if (task_is_stopped(current)) {
> 		/*
> 		 * While in TASK_STOPPED, we can be considered safely
> 		 * stopped by utrace_do_stop() only once we set this.
> 		 */
> 		spin_lock(&utrace->lock);
> 		utrace->stopped = 1;
> 		spin_unlock(&utrace->lock);
> 
> I think this is correct, but it is not easy to understand. SIGCONT may
> come right after the task_is_stopped() check, so this _looks_ racy.

Right, all that matters is that we are always on a path that goes back
through utrace_get_signal() before doing anything else utrace thinks about.

> But, nobody should clear ->utrace_flags without calling utrace_wakeup()
> which clears ->stopped too. 

Right.

> This means that the target can't escape
> from get_signal_to_deliver() with the ->stopped == 1. 

Right, that is the core invariant of all ->stopped logic.

> And in fact, we could check was_stopped instead of task_is_stopped().

Right.  If we were resumed rather than actually stopping now, then
->stopped will be cleared shortly anyway.  Since we have one test or the
other here anyway, the fresh test is a free way to optimize out the lock
and set when it happens to be that case.  (Not that it matters to optimize
this case, but it's free.)

> Is my understanding correct?

I think so.

> But! can't we miss utrace_wakeup() ?

I think you've found something (though not quite the scenario you describe).

> Let's suppose the debugger D attaches the single engine E to the target T.
> 
> 	D does utrace_set_events(JCTL).
> 
> 	T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts
> 	utrace_report_jctl().
> 
> 	D does utrace_set_events(events => 0), this clears E->flags.
> 
> 	T calls REPORT(), nobody needs needs JCTL, finish_report() sees !->takers
> 	and calls utrace_reset(). It sets ->utrace_flags = 0.

Nope:

			flags |= engine->flags | UTRACE_EVENT(REAP);

If there are any engines left on the list, ->utrace_flags is never zero.

So, change your scenario to:

	D does utrace_control(UTRACE_DETACH).

and then this will happen.

> 	T checks task_is_stopped(), sets ->stopped = 1.

Right.  In the utrace-indirect code, this was even worse!  The dangling
utrace pointer was invalid and should not have been used at all (it should
have fetched the new one under RCU).

> Now, when T is woken by SIGCONT, it returns to user-space bypassing all utrace
> hooks, and runs with ->stopped == 1. This doesn't look right. Say, D can do
> utrace_set_events(ANY) and then T hits start_report()->BUG_ON(utrace->stopped).

Right.  I think it's made safe with:

	if (task_is_stopped(task) &&
	    (task->utrace_flags & UTRACE_EVENT(JCTL))) {

In fact, just task->utrace_flags != 0 would be safe.  But only if JCTL is
set do we actually need to set ->stopped here.  (Otherwise, it will get set
later by utrace_do_stop or utrace_set_events.)


Thanks,
Roland




More information about the utrace-devel mailing list