Q: ptrace_detach() && UTRACE_DETACH

Fri Aug 7 19:45:41 UTC 2009

> 1. Suddenly I realized, I do not really understand why ptrace_attach()
>    tries to reuse the "almost detached" engine. Can't attach just fail
>    in this case as if the task is still ptraced?

No (unless it's actually a race with the first PTRACE_DETACH call).
Consider:

	ptrace(PTRACE_DETACH, pid, 0, SIGUSR1);
	ptrace(PTRACE_ATTACH, pid);

Assume the tracee doesn't get scheduled in between those two calls.
The caller synchronously knows (given error checking omitted above)
that the tracee is no longer traced after PTRACE_DETACH, so calling
PTRACE_ATTACH again is correct and raceless.

However, the behavior requirement is that the tracee deliver (not queue)
the SIGUSR1 before the tracer can see it do anything else after the second
attach.

(BTW, however the new code you write comes out, please make sure it is
full of comments that explain these sorts of subtleties that constrain
the design.)

>    ptrace_detach() always wakes up the tracee. This means it should call
>    utrace_get_signal() soon and complete the detach.

Right.  But "soon" is not a guaranteed ordering.

>    But, until we change/fix this unconditinal wakeup, any other reason why
>    the new debugger should try to re-use?

I don't think it matters specifically that it reuse the old engine.
That's an implementation detail you can change.  What matters is that
the parting signal gets delivered.  The "almost detached" engine is the
way that happens.  Its final report_signal callback needs to run before
the new engine first processes any signal.

The new engine should have no way to intercept the parting signal.
i.e., it happens before the first possible ptrace stop (of the second
attach).  If you were to attach the new engine along with the lingering
old engine, then it would get a report_signal callback as if the parting
signal were a newly-dequeued signal.  So you have to avoid that somehow.

> 2. Or. Perhaps we can add ptrace_utrace_detached_ops ? All methods should
>    return UTRACE_DETACH, except ptrace_utrace_detached_ops->report_signal()
>    fixups ->last_siginfo and returns UTRACE_SIGNAL_XXX | UTRACE_DETACH.

I'm not entirely positive there aren't any cases where another callback
would be made before the report_signal.  I think the only one where a
plain UTRACE_DETACH would be correct is report_death (for SIGKILL).
Otherwise, a UTRACE_DETACH that abandons the parting signal is wrong.
There may well be none, since it has to be in a ptrace-reported stop
already before PTRACE_DETACH works.  But it needs more thought.

>    ptrace_detach_task() sets engine->ops = ptrace_utrace_detached_ops before
>    utrace_control(UTRACE_INTERRUPT). We don't even need utrace_barrier().
> 
>    This means that the new debugger can another engine.
> 
>    Do you think this can work?

Perhaps.  In the utrace API it is not really kosher ever to change the
->ops pointer.  But this just means we would have to carefully examine
the utrace code paths and say what the true rules are about when and how
changing ->ops is safe.

> 3. A bit off-topic question. I can't understand ptrace_detach(sig) with
>    ptrace_report_syscall().
> 
>    Currently (without utrace), if we detach when the tracee sleeps after
>    ptrace_report_syscall()->ptrace_notify(), we set ->exit_code = sig and
>    the tracee send this sig to itself after wakeup.

Right.

>    But, utrace-ptrace does this differently. report_syscall_xxx() do not
>    play with signals, instead when ptracer does PTRACE_CONT/etc we send
>    the signal to tracee before wakeup. 

report_syscall_* run before the stop.  What signal (if any) to send is
not chosen until the wakeup (obviously, since it's an argument to the
waking ptrace call).  In the case of syscall entry, there isn't any
place in the tracee where we have a hook that runs after wakeup and
before the actual syscall.

If we were to merge the utrace-syscall-resumed branch API change, then
it would be possible to use the after-resume second report_syscall_entry
to do it.  We probably do want that API change or something like it
anyway (see the previous discussion about that, which petered out).

But off hand I don't see any particular reason it matters which of these
two places this send_sig() call goes.

>    (btw, send_sig() is wrong, the child can be dead without ->signal).

... except for this. ;-)

>    This means that with utrace-ptrace ptrace_detach(sig) does not imply
>    the signal if the tracee reported PTRACE_EVENT_SYSCALL.
> 
>    Should be fixed or I missed something?

One of us is missing something, because I didn't expect that result.
Since it's even in question, I think we should clearly have a case in
the ptrace-tests suite that specifically tests this behavior.

I haven't looked closely at the old utrace-ptrace.patch code in a while.
(I keep telling you, you're not fixing this old code, you're writing it
correctly yourself!)  Indeed, that looks wrong, that this special-case
logic for PTRACE_EVENT_SYSCALL is not done for the PTRACE_DETACH case.

This does seem notably cleaner with the utrace-syscall-resumed API.
Then the send_sig() special case is only in the report_syscall_*
callbacks in particular (using the resumed case of the entry hook)
and not on the wakeup side.  If PTRACE_DETACH keeps the current plan
for parting signals (as I think it must), then these callbacks will
naturally dtrt by treating resumed-with-signal the same whether it's
a normal resumption or an "almost detached" final callback.

Thanks,
Roland