utrace-ptrace && ptrace_check_attach()

Mon Oct 12 01:39:02 UTC 2009

> Issues with ptrace_check_attach(),
> 
> 	- it does utrace_control(UTRACE_STOP).
> 
> 	  This is wrong, ptrace_check_attach() must be "passive",
> 	  while utrace_control(UTRACE_STOP) can actualy stop the
> 	  tracee.

This is not inherently problematic on its own.  I'd say it's OK if you
have to actually be the tracer task to get that far, and of course if
you call utrace_control(UTRACE_RESUME) to reverse it in the failure
case.  The UTRACE_STOP itself does not have any effect that the tracee
can observe except as scheduling flutter.  (I'm not really counting it
cooperating with another process that does "ps" and sees "T"--we can
live with that too.)  If the actual tracer making stupid use of ptrace
calls that will wind up failing with ESRCH has the side effect of
lousing up the scheduling of the tracee it's failing to otherwise
molest, so be it.

It's a security issue if you can do UTRACE_STOP on a task that you would
not be allowed to ptrace, even for an instant.  It's at least quite
undesireable if you ever do that on a task other than one you do
actually have attached for ptrace.  If we can avoid those, don't worry
about the rest.  For the tracer itself, any UTRACE_STOP is "passive
enough" if compensated with a UTRACE_RESUME.  Just stay away from
UTRACE_INTERRUPT and you are being "passive" by my lights.

> 	- even if it doesn't, this is wrong when we have multiple
> 	  tracers. ptrace_check_attach() should not succeed if the
> 	  tracee was stopped by another engine.

Yes.  UTRACE_STOP state is necessary for ptrace to consider the tracee
to be "in ptrace stop", but it's not sufficient.  Ensuring UTRACE_STOP
is the mechanism for the underlying material effect that "in ptrace
stop" means, i.e. in (or on the way into) TASK_TRACED until SIGKILL.
Since we use UTRACE_STOP in report returns to implement ptrace stops,
this is normally already the case in almost all the situations where
ptrace_check_attach should succeed.  The only reason we have to use
utrace_control to ensure that's the case is for a tracee in TASK_STOPPED
(job control stop), which we have to convert into being a guaranteed
ptrace stop SIGCONT won't wake (i.e. UTRACE_STOP, i.e. TASK_TRACED).

So, we use the UTRACE_STOP interlock for that purpose.  But it's not
what we should use for "are we in a ptrace stop?" bookkeeping, which
previously was implied directly by task_is_traced().  What might make
sense as the indicator of "in ptrace stop" is "ctx->stop_code != 0" or
"ctx->resume == UTRACE_STOP".

> 	- it ignores "int kill".

That's for PTRACE_KILL.  This is a story similar to your question about
PTRACE_DETACH,SIGKILL.  That is, it's actually pretty useless but we
still don't want to break some arcane assumptions.  

In actual utility and reliability (and implementation), PTRACE_KILL is
exactly the same as PTRACE_CONT, SIGKILL.  There is just no point to it
at all.  But, people over the years have been understandably confused by
ptrace, not to mention even buggier states during its checkered past,
and have used whatever seemed to work at the time, and we get to bend
over pretty far to avoid breaking those people's expectations now.  

The manifest reality of PTRACE_KILL is that it doesn't check for
stoppedness, only attachedness, and so returns "success" (while doing
absolutely nothing useful!) on an attached task that is not stopped or
is already dead.  The former is extremely useless, but the latter is
actually harmless and probably behaves predictably today.  I will bet
that there is some code somewhere doing:

	if (kill(pid, SIGKILL)) fail_loudly();
	usleep(100);
	if (ptrace(PTRACE_KILL, pid)) fail_loudly();

or equivalent stupid things.  Today this will succeed every time and the
ptrace call will never do anything at all.  By the time ptrace is
called, the task is already dead, and so there is nothing to do, but
PTRACE_KILL succeeds anyway because of the ptrace_check_attach flag.

So I think you've got to keep doing that.  I don't think it should be
particularly difficult.  The first part of the ptrace_check_attach work
is to ensure that it's attached and you're the tracer.  The second part
is to ensure that it's stopped (including converting a job control stop
into a ptrace stop).  With the kill flag, you just omit the second part.

> Can we ignore these problems in V1 ? Or should we fix something?

It doesn't seem like it should be very hard to get this right from the
start.  I don't think we can submit something upstream that has any
known regressions even in stupid cases, except the explicit intended
"improvements" that we will document.

Thanks,
Roland