[PATCH 56] ptrace_resume_signal() should use context->siginfo under ->siglock

Fri Sep 25 22:08:31 UTC 2009

> - Change ptrace_resume_signal() to use context->siginfo under ->siglock,
>   like ptrace_{get,set}siginfo() do.

I don't think the log/comments give a clear picture of why this is the
thing to do.  To wit, I'm not even sure at the moment it is necessary.
It probably is, but I have to convince myself because the comments did
not ring true.

The only "uncoordinated race" case is SIGKILL.  In that case, we know
that get_signal_to_deliver never returns at all.  So "a stack variable
can go away" is not the logic that really applies.  It can't "go away".

What happens is that utrace_get_signal->finish_resume_report->utrace_stop
gets woken up by SIGKILL, so the task will get to utrace_get_signal's
fatal_signal_pending case.  There it will retake siglock and
dequeue_signal to clobber the same siginfo_t with the SIGKILL details.

Exactly that is the one and only case that you are racing with.  
Do you agree?

This should also be the only race for PTRACE_[GS]ETSIGINFO.  Right?

So, siglock does indeed protect against that dequeue_signal call.  OTOH,
you would not need siglock in the tracer for resume if you just saved the
signo in ptrace_context and did the siginfo_t fiddling in the tracee.

This is an uncommon case at least if you do the data != info->si_signo
check unlocked, which should be safe enough (i.e. in a race case that
could matter, you will be erring on the side of taking the lock and then
can check for si_signo==SIGKILL).  Since you do need the siglock in at
least one place in the tracer for PTRACE_[GS]ETSIGINFO anyway, it is not
much worse if it's not on every resume.

It's important not to take siglock in the tracer when it can be avoided.
It's significant that old ptrace does not take it for resume calls,
though OTOH it takes it once in ptrace_check_attach, which in normal
scenarios we may avoid in utrace_control.  The reason to worry about this
is that it serializes other threads taking signals and so forth.  So it
slows down other untraced threads doing signal things, and it slows down
another traced thread on its own path to a ptrace stop when it would
otherwise not have contention when it takes siglock.  At the macro level,
it's possible this could add up to something significant when measuring
"how much does strace slow it down" or "how slow is UML".

That said, taking siglock more often may in the end be the thing to do.
But the code should not go in with comments that give a false rationale.

Thanks,
Roland