F11 Preview Hard Lock

Adam Jackson ajax at redhat.com
Wed Apr 29 14:28:59 UTC 2009


On Wed, 2009-04-29 at 07:55 +0200, Richard Körber wrote:
> Hi!
> 
> > F11 Preview.i368 with all updates as of 30 minutes ago!  Everytime I
> > visit www.wthitv.com <http://www.wthitv.com> (with firefox) my computer
> > hard locks with only mouse movement.
> 
> This sounds like the infamous "EQ Overflowing" bug:
>
>    https://bugzilla.redhat.com/show_bug.cgi?id=465884

"EQ overflowing" is not a bug, dang it.  It's a symptom.

Input events are handled in two places in X.  A signal handler reads the
events from the kernel, updates the cursor position, and adds the event
to the queue; and then the main loop drains the event queue, sending
events to clients and possibly updating the cursor image (say, if it
changes from an arrow to a text bar).  Since you can't malloc from
signal handlers, the event queue is some fixed (but large) size.

If you run out of space in the event queue, it's because the server is
stuck somewhere away from the main loop.  But that's not any one
specific place, that's the entire rest of the X server.  You could be
waiting for the hardware to go idle.  You could be waiting to acquire
the DRI lock (in DRI1).  You could be trying to do some kernel request
that the kernel is taking its sweet time to get around to.  _Anything_.
So "EQ overflowing" is not any one particular bug, it's a symptom of
many different bugs.

In this particular case, it's something much more lame:

(gdb) bt
#0  0x00de3f73 in pixmanBltsse2 (src_bits=0xa8587000,
dst_bits=0xa0587000, src_stride=1408, dst_stride=1408, src_bpp=32, 
    dst_bpp=32, src_x=0, src_y=0, dst_x=0, dst_y=0, width=1299,
height=15000) at pixman-sse2.c:4530
#1  0x00dd95ba in pixman_blt (src_bits=0xa8587000, dst_bits=0xa0587000,
src_stride=1408, dst_stride=1408, src_bpp=32, 
    dst_bpp=32, src_x=0, src_y=0, dst_x=0, dst_y=0, width=1299,
height=15000) at pixman-utils.c:51
#2  0x00ef440a in fbCopyNtoN (pSrcDrawable=0x136b7530,
pDstDrawable=0x1351e660, pGC=0x13409748, pbox=DWARF-2 expression error:
DW_OP_reg operations must be used either alone or in conjuction with
DW_OP_piece.
) at fbcopy.c:64
#3  0x002f0cc6 in uxa_copy_n_to_n ()
from /usr/lib/xorg/modules/drivers//intel_drv.so
#4  0x00ef344b in fbCopyRegion (pSrcDrawable=0x136b7530,
pDstDrawable=0x1351e660, pGC=0x13409748, pDstRegion=0xbf983b04, 
    dx=0, dy=0, copyProc=0x2f0570 <uxa_copy_n_to_n>, bitPlane=0,
closure=0x0) at fbcopy.c:396
#5  0x00ef396d in fbDoCopy (pSrcDrawable=0x136b7530,
pDstDrawable=0x1351e660, pGC=0x13409748, xIn=0, yIn=0, widthSrc=1299, 
    heightSrc=15000, xOut=0, yOut=0, copyProc=0x2f0570
<uxa_copy_n_to_n>, bitPlane=0, closure=0x0) at fbcopy.c:596
#6  0x002f0518 in uxa_copy_area ()
from /usr/lib/xorg/modules/drivers//intel_drv.so
#7  0x0817a753 in damageCopyArea (pSrc=0x136b7530, pDst=0x1351e660,
pGC=0x13409748, srcx=0, srcy=0, width=1299, 
    height=15000, dstx=0, dsty=0) at damage.c:949
#8  0x08084b45 in ProcCopyArea (client=0xf205018) at dispatch.c:1555
#9  0x08086807 in Dispatch () at dispatch.c:437

Check out frame 0 [1].  The server is being asked to do a blit that's
15000 pixels tall.  Who knows why, it could easily be a bug in firefox,
but it could also be something the web page really did ask for.  At any
rate, the server's been asked to do it, so we really have no choice but
to do it.  Problem is your GPU can't accelerate that blit; even really
nice ones like R700 have a blit coordinate limit of like 8192 pixels.
So we fall back to software, which is the pixman bits in the backtrace.

Now the subtle bit is that, okay, yeah, it's a huge blit, but it's also
only like 75M of data.  You have giganoms [2] of memory bandwidth, this
should take milliseconds, not minutes.  So what's almost certainly gone
wrong here is that the kernel is giving us a bad caching policy on the
map of video memory, so all those reads have to go out to the bus every
time.  So you're not actually hardlocked.  You're just doing something
very slow, a few billion times.

I've been chasing this bug all week.  Hopefully it'll be fixed soon?

[1] - Also check out frame #2.  Yeah baby, X is so hardcore it breaks
gdb's DWARF decoder.  Bug #497425.

[2] - One nom being one byte per second, of course.  Om nom nom.

- ajax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20090429/11152963/attachment.sig>


More information about the fedora-devel-list mailing list