[PATCH] utrace performance tests

Wed Apr 30 06:59:41 UTC 2008

Hi!  Thanks for your interest in utrace performance issues.
It is an area that has not received much attention.

> In order to make a performance comparison between ptrace and utrace we
> modified ptrace testsuite(http://sourceware.org/systemtap/wiki/utrace/tests)
> to provide information about execution time. The patch with this change can
> be found attached.

I don't think your patch gives anything you can't get just from 
running "make check TESTS_ENVIRONMENT=time", or, equivalently, putting:

	TESTS_ENVIRONMENT = time

in tests/Makefile.am and running automake.  You might find that simpler to
play around with.

I'm not sure how useful this is as a performance measurement.  These are
pretty much all regression tests that don't do much.  There is probably
much more noise than signal in just running one of those tests.

The "make xcheck" tests run lots of iterations, so they might be a little
more useful.  Still, none of these were ever intended as benchmarks.

> In face of results I have a question: "According to these tests, currently
> Utrace performance is about the same or slightly slower.  We guess the
> reason is that the ptrace() syscall API itself is the bottleneck, not the
> ptrace implementation.  Does that seem valid assumption? "

We have certainly always proceeded from the presumption that the sheer
number of system calls and context switches, and attendant synchronization
delays, necessitated by the ptrace interface dominates the overhead
associated with any given debugging task.

I'm not really clear on which kinds of performance you are interested in.
For an apples-to-apples comparison of the same ptrace user ABI on otherwise
identical vanilla and vanilla+utrace kernels, perhaps the best simple thing
to do off hand is something like:

	/usr/bin/time strace -c dd if=/dev/zero of=/dev/null count=100000

i.e., a low-overhead no-nothing debugger (strace) on anything that
does a lot of traceable events (syscalls) quickly.

On that kind of test I'd expect any difference to be in the noise,
swamped by the overhead of the same essential costs (syscalls and
synchronization) that any implementation of the ptrace interface must
have, as you have seen.  For individual ptrace microbenchmarks (call
PTRACE_GETREGS a million times in a tight loop or whatever), I'd expect
utrace-ptrace to be marginally slower than vanilla-ptrace.  (I expect
the same for the simple overhead test, just that it's lost in the noise
by a mile.)  That's in part just because it hasn't been measured and
tuned at all.  Also the utrace + utrace-ptrace implementation is
substantially more complex than the vanilla-ptrace implementation,
because it has the whole infrastructure layer in there (to make possible
new things that aren't actually being done now).

Note, it's best to test on 2.6.25 or later (current tip).  That gives
the closest comparison, because the arch parts of the code (paths in
PTRACE_GETREGS et al if you're measuring that) are now the same.

Note also, I wouldn't put too much effort into analyzing the details of
the current utrace code's ptrace implementation.  The way I tie ptrace
in is likely to change a fair bit in the near future as I get the code
in shape for upstream.

The performance measurements that might be more interesting are the
apples-to-oranges ones (i.e. "why oranges are better").  That is,
comparing some high-level debugging/tracing task, done the only way you
can do it with ptrace, to another way made possible by utrace.  That's
not a very simple undertaking given what we have to work with so far.
If that's the sort of thing you are interested in looking at, I'd be
glad to work with you on it.  I've taken some preliminary stabs in that
direction in the past, but never got it all that far and what work I did
has gotten some bit-rot by now.

Thanks,
Roland