[libvirt] [PATCH] phyp: too much timeout when polling socket

Daniel Veillard veillard at redhat.com
Wed Nov 11 13:46:12 UTC 2009


On Wed, Nov 11, 2009 at 11:06:50AM +0000, Daniel P. Berrange wrote:
> On Wed, Nov 11, 2009 at 11:55:29AM +0100, Daniel Veillard wrote:
> > On Wed, Nov 11, 2009 at 03:11:31AM -0200, Eduardo Otubo wrote:
> > > Hello all,
> > >
> > > Since I moved to libssh2 I had noticed a weird behaviour, virsh was  
> > > taking too much time to complete the operations when using phyp driver.  
> > > Just found the problem, 10 seconds of timeout passed to select().  
> > > Changed to zero, since I'm just polling the socket.
> > >
> > > Actually this patch is more important than it seems, now I can write a  
> > > script (using virsh) to test all the phyp features.
> > 
> >   The problem is that you end up creating a busy loop by doing this,
> > and I would be surprized if that patch was actually correct. Somehow
> > somewhere in one of the loops calling waitsocket() you should not make
> > that call, this is what is blocking you and using gdb (or another
> > debugger) during that time out will allow you to find out exactly where
> > this is hapening. So rather than just dropping the ball and just looping
> > you should take that opportunity of a reproduceable bug to debug it :-)
> > 
> >   In the meantime moving the timeout from 10s to 1 millisecond by
> > changing tv_sec to 0 and tv_usec to 1000 is certainly a good
> > workaround, it will still avoid looping needlessly, and the 1ms extra
> > timeout should not break your scripts. I will change the code
> > accordingly, but you should debug your issue with the 10s delay (or even
> > increase it if it makes things easier to set up).
> > 
> >   So I pushed that modified fix but please debug the issue :-)
> 
> I don't think that is correct really. 
> 
> This function is invoked when libssh2 gets EAGAIN, and needs to wait for
> more data to arrive on the socket. It should be waiting for select() to
> show the file handle is writable or readable, arguably using an inifinite
> timeout, particularly because no caller of waitsocket() ever bothers to
> check the return value for a timeout errors

  The problem is that right now, the code doesn't call it only when
getting EAGAIN from libssh2 but in a number of various loops. It's
clearly called from places it shouldn't (hence the 10s timeouts), and
current patch is only a band aid. Hence my request to do the actual
debug.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel at veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/




More information about the libvir-list mailing list