Possible problem with utrace_control

Sun Jun 13 10:00:56 UTC 2010

Thanks for your interest in utrace.

It's correct that passing UTRACE_REPORT to utrace_control should
always ensure you get some report_* callback before the tracee returns
to user mode.

However, I think your use may be susceptible to a race between
resumption by utrace_control, and the UTRACE_STOP done by the
callback.  If your callback looks something like:

	wake_up_consumer();
	return UTRACE_STOP;

then there is a window when the consumer can wake up and call
utrace_control before that callback has actually returned UTRACE_STOP
and had it processed.  In that event, your UTRACE_REPORT comes before
the UTRACE_STOP, and then you do nothing afterwards to resume it.  If
it ever did resume, it would indeed report (the UTRACE_REPORT is still
"pending").  But that doesn't help you.

Off hand, I think there is only one way to address this race.  When
you get the notification to consume, before calling utrace_control,
call utrace_barrier first.  This will block the consumer until the
producer's callback has actually finished and had UTRACE_STOP recorded
for your engine.  At that point, you can be sure that your
utrace_control call will indeed wake up the stopped tracee.

Oleg or others here can double-check my reasoning and help explain the
situation if you share your code for them to see.

Because of this possibility, it might make sense to have utrace_control
return -EINPROGRESS for this situation.  That at least would alert you
that something might be amiss.  Unlike other cases of -EINPROGRESS, this
would not just mean that you need to call utrace_barrier to be sure the
effect has completed.  Rather, it means there is a danger of the
scenario I described above, where the effect you asked for is going to
be trumped by the callback's UTRACE_STOP.  So it doesn't quite fit in
with the other situations--it's not really an indication that you have
to follow up with a synchronization, it's an indication that your
synchronization logic is already wrong.  But at least it would relieve
the mystery of such a situation, assuming you are checking return values
and noticing in debugging.  I'm certainly open to opinions on this API
change.

The need to use utrace_barrier as a second synchronization after the
main wake-up (done by whatever normal means you are using) is rather
clunky, and when it matters (as we suspect it does in the occasional
race in your case) it leads to ping-pong scheduling.  So we would like
to improve the utrace API in this area at some point.  One idea we have
discussed before is a "utrace_wake_up" call.  (This assumes that some
variant of <linux/wait.h> wake_up* is what you are using in the guts of
"wake_up_consumer".)  This would replace wake_up* for use in a callback,
and simply do the wake-up after all the callbacks are finished (or
perhaps just yours, but probably all), so your return value action is
already recorded.  In fact, it might do the wakeup only after the
UTRACE_STOP is not only recorded but actually performed, so that by the
time your consumer wakes up, the tracee is actually stopped and
available for third-party examination.  (You don't actually care about
that for your case--for your purposes, it would be optimal just to delay
the wakeup until your own callback's return value is recorded.  That
way, if your consumer acts quickly enoug on another CPU, it might even
resume the tracee before it actually yields the processor.)

As you might tell from the complexity of that paragraph, there are many
fuzzy nuances of exactly what that better API might be.  (And I didn't
even delve into the possibility that you're using some synchronization
mechanism other than wake_up* calls on a wait_queue_head_t.)  Hence we
haven't tried to work it out exactly yet.  We hope that seeing a variety
of uses such as yours will help us figure out how best to improve the
utrace API for tracee-callback-to-tracer synchronization handshakes.

Thanks,
Roland