Stopped detach/attach status

Roland McGrath roland at redhat.com
Mon Oct 12 00:29:23 UTC 2009


> Yes. In particular, ptrace(PTRACE_DETACH, SIGKILL) should cancel
> SIGNAL_STOP_STOPPED, yes?

Yes.

> > > 	-			sig->flags = SIGNAL_STOP_STOPPED;
> > > 	+			sig->flags = SIGNAL_STOP_STOPPED | SIGNAL_STOP_DEQUEUED;
> >
> > Boy, do I not understand why that does anything about this at all!
> > But I am barely awake tonight.  Ok, I guess I do sort of if it goes
> > along with some other patch to set SIGNAL_STOP_STOPPED.  But since
> > you've verified you really understand what happens, you can tell us!

I actually thought of it right after I sent this, but I was too tired to
follow up then.  It's good that you've posted this particular concrete
scenario to document it more fully.  Here's the way I think about that:

SIGNAL_STOP_DEQUEUED exists for one purpose.  It's to ensure that SIGCONT
and SIGKILL can clear it to make complete their required effect of clearing
all pending stop signals.  (It fills the hole when another thread has
dequeued a stop signal and then dropped the siglock to make its call to
is_current_pgrp_orphaned()--so that half-delivered signal is still
considered "pending" and thus must be cancelled by SIGCONT or SIGKILL.)

In the debugger case, there is a far larger hole possible, where a thread
has dequeued a stop signal and then dropped the siglock to block for an
arbitrary period while the debugger contemplates the signal.  But to me
this is really the same case as far as the signal semantics are concerned.
When the debugger decides to send the signal on, it then picks up in the
same "half-delivered" situation and goes the rest of the way.

What I've just described is a simple "race" with an external SIGCONT or
SIGKILL.  This maps exactly to the is_current_pgrp_orphaned() window--it's
just a window that can easily be far larger, and can be kept open forever
and so to the debugger user with global perspective can be observed as a
"non-racey" hole (hit SIGTSTP in the debugger, send SIGCONT from another
terminal, continue in the debugger).

Now, the case we are considering really is different from that race.
But I think the same essential logic applies: you have a half-delivered
stop signal "in flight", so either there has been a SIGCONT or SIGKILL
to cancel it, or there hasn't.  Since there hasn't, nothing should
prevent the normal operation of that stop signal's final delivery.
It's a bug that something does.

Another way to put it is to say that the "exists for one purpose"
statement above implies that only an actual SIGCONT or SIGKILL should
ever clear SIGNAL_STOP_DEQUEUED.  In fact, only one place clears the
flag explicitly, but six others do so implicitly.  The one explicit
place and one of the implicit places is the one that clearly should: the
SIGCONT case in prepare_signal().

Three implicit places are the ->flags = SIGNAL_GROUP_EXIT cases
(zap_process, do_group_exit, complete_signal).  These are harmless
because they are already effectively mutually exclusive, since the one
check of SIGNAL_STOP_DEQUEUED is:

		if (!likely(sig->flags & SIGNAL_STOP_DEQUEUED) ||
		    unlikely(signal_group_exit(sig)))
			return 0;

The remaining two places are the ->flags = SIGNAL_STOP_STOPPED cases in
do_signal_stop and exit_signals.  Since SIGNAL_STOP_DEQUEUED must always
have been set before if you can get to those situations, it is harmless
to use "->flags = SIGNAL_STOP_STOPPED | SIGNAL_STOP_DEQUEUED" instead of:

	sig->flags &= SIGNAL_STOP_DEQUEUED;
	sig->flags |= SIGNAL_STOP_STOPPED;

or anything like that.  It's probably cleanest to consolidate those two
cases to call a single subroutine that does the tracehook_notify_jctl
logic, unlock and do_notify_parent_cldstop.  It can take a caller flag
or just check PF_EXITING to omit the ->exit_code + ->state change of the
do_signal_stop version of the code.  That one subroutine can have a
clear comment about the nonobvious flag usage.

> But please remember, the patch above is not complete of course and currently
> I do not see the good solution. 

What's incomplete aside from handling the exit_signals case the same way?

> I am starting to think we should forget
> about these bugs, merge utrace-ptrace, and then try to fix them.

If we can have utrace-ptrace code whose corner behavior matches the old
code and is itself clean, then I don't care about the order of the
changes going in.  But it's not really clear to me that we can even
describe the old behavior in terms clean enough to make an exact
work-alike implementation that could possibly be clean.


Thanks,
Roland




More information about the utrace-devel mailing list