utrace vs syscall emulation

Mon Jul 14 10:54:29 UTC 2008

Hi Renzo!  Sorry for the long delay in getting back to you about these issues.

Please take a look at the current GIT trees or patches for context.
Your comments referred to details of the old-style utrace interface,
and we have the new one in place now.  I just did some further tweaks
to syscall tracing today.

Re: 1- TIF_SYSCALL_EMU is useless.

In the new implementation, TIF_SYSCALL_EMU is left as it is for the old x86
ptrace.  With CONFIG_UTRACE_PTRACE=y it's not used at all.  So this point
is moot.  (All old ptrace code is unperturbed unless CONFIG_UTRACE_PTRACE=y.)

Re: 2- "skip syscall" management.

There are two layers to this: the arch<->generic interface, and utrace.

Firstly, just about the arch layer.  I came to your way of thinking for
having it abort/skip the syscall without losing the information of the
syscall number from its original location (orig_ax on x86).  So, now
tracehook_report_syscall_entry() returns nonzero to arch code to tell it
that it must skip the syscall and go directly to syscall-exit tracing.

The tracehook return value replaces the syscall_abort() call in
asm/syscall.h, which is now gone.  Instead, I've added a function
syscall_rollback() to the asm/syscall.h spec.  This can be used from
syscall-exit tracing to restore the user registers to what they were at
syscall entry after tracehook_report_syscall_entry() aborted the call.
(This isn't used for PTRACE_SYSEMU, since its traditional behavior is to
leave -ENOSYS in ax and the original ax in orig_ax.)  But in general for
writers of modules wanting to do syscall emulation sorts of things,
using syscall_rollback() glosses over the internal arch-specific
idiosyncracies of syscall tracing.

The details of how to implement the treatment of a nonzero return from
tracehook_report_syscall_entry() optimally is entirely up to the arch
and just depends on the details of its assembly paths.  I implemented
the new behavior on x86 and powerpc.  The way I did it there doesn't
have deep meaning, it's just what makes things simple and streamlined in
the assembly code on each machine.  The important thing is that it skips
the syscall without clobbering the pt_regs field that holds the syscall
number, so that syscall_rollback() has the information to find later.

For utrace, the report_syscall_entry callback now has more meaningful
bits in its @action argument and its return value.  These are shown in
enum utrace_syscall_action.  Similar to the report_signal callback,
those bits in the @action argument are the choice made by the previous
engine to get this callback, but the bits in your return value override
the last engine's choice (and are overridden by the next engine).  So if
you ignore the @action argument in your callback, you will execute the
system call that another engine wanted aborted/emulated.

Under CONFIG_UTRACE_PTRACE=y, we now implement PTRACE_SYSEMU using
UTRACE_SYSCALL_ABORT and do not use TIF_SYSCALL_EMU at all.

A caveat is that when your callback's return value uses UTRACE_STOP,
your UTRACE_SYSCALL_* choice in that same return value is fixed when
that callback pass finishes.  When utrace_control() later resumes the
thread, it will wake up and either run or skip the syscall depending on
the choice made in the report_syscall_entry callback's return value
before it stopped.  This means you can't implement something like
PTRACE_SYSVM.  You can have a tracing engine that does whatever fancy
things it can do synchronously in the report_syscall_entry callback
(without blocking) to decide whether to run or skip the syscall.  But
you can't stop, e.g. to be woken up by an asynchronous wakeup call from
a user-level debugger/controller, before you make your choice.

What I am contemplating is this.  As now, when you return UTRACE_STOP
the UTRACE_SYSCALL_* choice you return also holds sway by default.
That is, when you are woken up with UTRACE_RESUME.  But, waking from
stop at syscall-entry with UTRACE_REPORT would have a special meaning.
Then a second round of report_syscall_entry callbacks is made to
interested engines, but the @action argument starts with a special
value, UTRACE_SYSCALL_REPORT.  This tells the engine that this is not
a fresh syscall entry event, but just restarting after UTRACE_STOP was
used at the same syscall entry last reported.  It now has a chance to
return its new choice of RUN or ABORT, taking into account whatever
bookkeeping might have been tweaked while we were stopped.  

I think that's a bit dubiously hairy.  I haven't done it, it's just a
thought.  But if this caveat is a real pain point for you, then we
should iron something out.

Re: 3- utrace module nesting (again)

I take your point here.  I think it probably does indeed make sense to
reverse the order of engines for syscall-entry vs all other events.
However, I still haven't done it in the current version.

The trouble with this is some hairy implementation details.
It's how we use the list with RCU to support asynchronous attach.
That makes it impossible to safely do reverse walks.

I may want to get rid of the RCU lists for implementation reasons
anyway.  That would mean plain list.h lists where either order of
iteration is easy.  But bear with me for the moment.  As the utrace
implementation now stands, we can't change the order for one event.

At the utrace level, we can continue to work out kinks in the future.

What I hope right now is that the tracehook_report_syscall_entry()
return value protocol and syscall_rollback() are sufficient arch
interfaces for whatever we might build.  I think it's a reasonable
balance of keeping the arch work fairly minimal (especially the
assembly tweaking) and giving a higher level of functionality than
most arch's had before, in a clean way.  If this is arch interface
is workable from the perspective of utrace and above, then we can
stabilize the tracehooks spec and get arch folks on the job sooner.

Thanks,
Roland