Performance tuning the Fedora Desktop

Wed May 12 09:50:36 UTC 2004

On Tue, 2004-05-11 at 15:31, Alexander Larsson wrote:

> In general, debugging desktop apps is quite different from lowlevel,
> non-interactive apps. First of all they are generally much more
> structurally complex, relying on many shared libraries, several
> processes with various sorts of IPC and lots of file I/O and user input.
> Secondly, the typical call traces are very deep (60 or even 80 frames
> are not uncommon), and often highly recursive. A typical backtrace
> involves several signal emissions, where each emission is on the order
> of 5 function calls deep (just for the signal emission code, not the
> called function). These functions are also typically the same, so they
> intermingle the stack traces. Take this simplified backtrace for
> instance:
> 
> A: 
> signal_a_handler () - foo(); return TRUE;
> g_signal_emit () - locate callback, call
> caller_a() -  g_signal_emit (object, "signal_a", data)
> 
> B: 
> signal_b_handler () - bar(); return TRUE;
> g_signal_emit () - locate callback, call
> caller_b() -  g_signal_emit (object, "signal_b", data)
> 
> When looking at a profile for code such as this, what you see is that
> caller_a() uses a lot of time, but when you go into it to see what it
> does, you end up looking at g_signal_emit, which gets called by lots of
> other places like B, so its very hard to figure out what of that is
> actually from the A call.
> 
> It gets even worse in the (very common) situation of the signal handler
> itself emitting a signal. This creates a mutual recursion into the
> g_signal_emit() function similar to the a+b case above:
> 
> signal_b_handler()
> g_signal_emit ("signal b")
> signal_a_handler ()
> g_signal_emit ("signal a")
> caller_a() 
> 
> When stepping into the g_signal_emit from signal_a_handler it seems like
> that calls signal_a_handler again, since thats another child of
> g_signal_emit. Profilers just don't handle this very well.

If you haven't already, I'd suggest looking at either the speedprof
profiler that comes with memprof (you'll have to use cvs HEAD) or the
sysprof profiler. They both have a visualization that helps with this
problem a lot. For example the stack trace above:

  signal_b_handler()
  g_signal_emit ("signal b")
  signal_a_handler ()
  g_signal_emit ("signal a")
  caller_a() 

will be visualized as this tree:

					Self	Total

  caller_a()				  0%	100%
     g_signal_emit ()			  0%	100%
         signal_a_handler()		  0%	100%
         signal_b_handler()		100%	100%

So all the recursions through g_signal_emit() are combined and shown in
one list. In other words you get a break-down even for recursive data.
If either of the signal handlers were to emit another signal, the
visualization would be the same, except that the numbers would be
different.

> Here is a couple of issues I have with current profiling tools:
> * They have no way of profiling i/o and seeks. A lot of our problems is
> due to reading to many files, reading files to often, or paging in
> data/code. Current profilers just don't show this at all.

I agree. What would be really nice is we could get the kernel for each
disk access to provide this information:

	- what process caused it to happen
	- what is the stack trace of the process when it happened
	- what kind of disk access
		- page fault: what address faulted
		- read: what file was read
		- other kinds of disk access

With that information it would be possible to see what parts of an
application are responsible for disk access.

Also minor page faults would be interesting to know about for startup
time, because they represent the worst case page fault-wise (they
*could* have been major faults if a different set of pages were in
memory). 

Søren