utrace-ptrace && detach with signal semantics

Mon Oct 12 03:55:05 UTC 2009

I'm replying out of order.  So I'm sorry I started rehashing some of this
same stuff in the other thread before reading where you'd already referred
to it.

> What if the tracee reports a signal and stops, the tracer detaches
> but does not wake it up because of SIGNAL_STOP_STOPPED ? In this
> case this signal will be visible after the next attach.
> 
> OK, currently ptrace_detach() unconditionally wakes up the tracee.
> But we seem to agree this wakeup should be killed, it completely
> breaks the group-stop logic and it has other problems.

So today the manifest semantics in userland is that PTRACE_DETACH from a
signal stop causes immediate delivery of the signal in that thread
regardless of the group stoppedness.  In a (converted) job stop or
non-signal, non-syscall ptrace stop the tracee resumes and drops the passed
signr on the floor.  Those semantics are very clear for the single-threaded
case at least, so we can just be compatible.  

In MT cases where the result is a group-exit, that is pretty consistent
and reliable today, so we probably need to be compatible there too.  In
MT cases where the result is to flutter back into job stop (SIGSTOP et
al), then there too it is reliable today that you'll get another SIGCHLD
and wait wakeup to go with that "freshly stopped again".

In MT cases where the result is to confuse the group-stop state, i.e.
tracee runs (signal handler or ignored signal) while group is supposedly
stopped, we probably don't break anything not already broken by cleaning
it up somehow.  But if the tracer does a loop of PTRACE_DETACH on each
tracee thread, then the result today is probably pretty consistent in
manifest behavior today: they all get back to running, even if the
SIGNAL_STOP_STOPPED flag is still set.  But I'm not at all sure anything
could discern from userland that SIGNAL_STOP_STOPPED is still set.  It
will be set anew or cleared by the time it actually matters.  So maybe
it really is consistent enough with today's observable behavior just to
have PTRACE_DETACH clear SIGNAL_STOP_STOPPED.  I'm guessing that will be
pretty well consistent in the loop-of-detach case (by the end of it).
I'm not really sure what would go different in an oddball case where you
PTRACE_DETACH'd one thread but the others were actually left stopped
(e.g. in true job stop and you only ever attached the one thread).  But
it probably doesn't much matter, that usage mode must already be pretty
broken I would think.

> If "monitor" really wants to kill the tracee it should use SIGKILL ;)

Say it wants a core dump.

> Any other fatal signal can be ignored by the tracee's sub-thread after
> detach but before the tracee returns from ptrace_signal().

Say they never tested the MT case before and haven't started caring
about it now.  They just want what they have observed before to stay the
same as they saw with the old kernel.

> Since detach_signal() actually sends the signal, ptrace(DETACH, SIGKILL)
> really works: it wakes up SIGNAL_STOP_STOPPED tracee.

Today's manifest behavior (via the bogus wake_up_process) is that this
SIGKILL will get through and take effect (in signal stops, as above).
So that should stay.  

As to the theory of things, this is the flip side of the whole
SIGNAL_STOP_DEQUEUED issue I just posted about earlier.  Whereas only an
earlier SIGCONT/SIGKILL would have cleared stop signals and thus should
clear any "pending-equivalent" "half-delivered" stop signals, earlier
stop signals would have cleared a pending SIGCONT.  I'm not entirely
sure what should happen in all the SIGCONT cases, but we can probably
enumerate them and hash those out.  I doubt anybody is depending much on
the corners of that.

For the SIGKILL cases, it's more clear.  No matter what, if a SIGKILL
doesn't just get dropped on the floor (non-signal stop cases), then it's
going to override everything else and soon.

> Lets discuss how we can implement detach to be compatible...
> We already discussed detached_ops. So, we either change engine->ops
> (my choice) or add the new engine with ->ops = detached_ops and detach
> the ptrace engine. 

I don't recall all of the earlier discussion about this in detail.
But yes, they are equivalent and I think a fresh engine is cleaner.

> We need detached_report_signal() callback which
> fixups siginfo if signr != info->si_signo and returns
> UTRACE_DETACH | UTRACE_SIGNAL_DELIVER.

Right, ptrace_report_signal does that now with ->resume = UTRACE_DETACH.

> How can detached_report_signal() set info->si_pid correctly? the
> tracer has gone away. This probably means this info should be prepared
> during detach, the tracer should fill context->siginfo.

It means that if the tracer cared at all, he did PTRACE_SETSIGINFO so
this would be moot.  The vanilla kernel today uses si_pid and si_uid
values of what's either the new tracer (with racing PTRACE_ATTACH) or
the real parent at the time of actual resumption.  It can be any of
those values or 0 and I think it's fine.

> But the real problem, I can't see how can we teach ptrace_report_signal()
> to not play with parting signals. Do you have any idea? perhaps we can
> introduce UTRACE_SIGNAL_DELIVER_I_AM_DETACHED_ENGINE, but this is ugly.

I'm not sure I'm following you.  Do you mean the new attach after the
detach?  Indeed, we don't want this to see the detached engine's action
as "incoming state" and report it again.  I guess I would arrange that
you never have them both attached and active (i.e. events set).  So,
attach the new engine, then look up for any existing detached engine.
If you find one, you can do one of two things.

1. Take over the old engine's parting signal and mark it in your own
   context.  Then detach the old engine.  You set the context to the
   same state it's in after you have resumed for PTRACE_CONT,signr.
   Your report_signal callback can always get UTRACE_SIGNAL_REPORT
   callbacks in that state due to other engines stopping and resuming,
   so it should already have logic for that state to deliver the signal.
2. Just mark a special state in your context saying that you should do
   nothing on the next UTRACE_SIGNAL_DELIVER report you see.  The old
   engine will be earlier on the list and run first, getting
   UTRACE_SIGNAL_REPORT and returning UTRACE_SIGNAL_DELIVER|UTRACE_DETACH.
   Then your engine runs later, and knows to let this one signal pass
   through, resetting its context for the next real signal report.

> Oh, I'd like to avoid detached_ops. Perhaps the new attach can re-use
> the old engine/context?

Both can be basically equivalent.  I'm not clear on why you prefer that.

> ptrace(DETACH, SIGNR) does:
> 
> 	context->detached = true;  // checked in ptrace_report_signal()
> 	return utrace_control(INTERRUPT);
> 
> Now, how ptrace(ATTACH) can clear ->detached and re-use this engine without
> utrace->lock ? IOW, will you agree to abuse utrace->lock for this?

That sounds insane.

> OR. Perhaps we can introduce UTRACE_ATTACH_MATCH_FILTER_DATA ?

This sounds more insane.

I don't understand why you wouldn't just do a normal MATCH_OPS lookup
and then look at the data structure.  You could even do CREATE|MATCH_OPS
without EXCLUSIVE, and then if (engine->data != ctx) drop your fresh ctx
and use ctx=engine->data; use it if ctx->detached and -EPERM if not.

> 		if (!IS_ERR(engine)) {
> 			// We re-used the old engine, but perhaps it is
> 			// going to return UTRACE_DETACH ?
> 			utrace_barrier(engine);
> 			if (engine->ops == utrace_detached_ops)
> 				engine = -EANY;

That's what the return value from utrace_barrier is for.

> What do you think?

I think the code you sent is better than some of these ideas.  But
it's still not real coherent to me.  I feel like you are worrying
about particular problems in contorting to some of this really strange
hair, but I haven't become clear on what those problems are so as to
understand what clean solutions to them might be.

Thanks,
Roland