ntrace: interface ideas

Wed Jul 2 21:31:28 UTC 2008

On Wed, 2008-07-02 at 05:20 -0700, Roland McGrath wrote:
> Here is a vague start at the directions I have in mind for a user-level
> interface that I've been calling "ntrace".
...
> 
> I'll start with a couple of terms that I'll use later throughout the
> discussion.
> 
> A tracing session is one independent use of the interface, e.g. one
> debugger application might have one session to handle many debugees.
> Different sessions don't interact or know about each other directly.
...
> 
> A tracing "channel" is the term I'll use to encompass the whole subject
> of transport.
...
> A channel can be a
> source of commands.  A channel can send data back to the user.
> 
> For sending data, a channel might have various buffering options and
> characteristics available.  Sending to the channel per se has to be
> nonblocking in the kernel.  
...
> 
> I think of the interface as asynchronous at base.

This certainly seems like a departure from the approach employed by
systemtap, gdb, ptrace, utrace, *probes, etc., where the traced thread
is essentially suspended from its normal operation while the
instrumentation code handles the event.  If that event handler wants to
adjust the set of events I'm tracking (e.g., turning on syscall tracing
when a particular user-mode function gets called), then finding out
about the event (i.e., the function call) several milliseconds down the
road isn't very helpful.

What you're describing sounds like event logging -- which, while useful
in a lot of ways, isn't what I thought we're trying to accomplish.

> There may at some
> point be some synchronous calls to optimize the round trips.  But we
> know that by its nature an interface for handling many threads at once
> has to be asynchronous, because an event notification can always be
> occurring while you are doing an operation on another thread.  So what
> keeping it simple means for the first version is that we only worry
> about the asynchronous model.

It seems to me that we ought to consider the possibility of
multithreaded tracer apps -- e.g., where there's a tracer thread for
each traced thread.  That way there's the real possibility of catching
all the events from a multithreaded app, in a timely manner, without all
the traced threads grinding to a halt every time a thread hits an event
of interest.

I think we all agree that a tracer thread should be able to block while
polling for an event.  A key question is whether a traced thread can
block waiting for the tracer app to handle the event.  Ptrace certainly
supports this.  But as I recall, you're rather adamant that a utrace
callback (other than report_quiesce) should NOT block waiting for
something to happen in user space -- e.g., because a SIGKILL can't get
delivered during that time.  Would your new utrace API support something
like this?

Seems like you're on the right track otherwise.

> 
> 
> Thanks,
> Roland
> 

Jim