[Libvir] [PATCH] Remote 0/8: Plan

Fri May 11 23:39:57 UTC 2007

On Fri, May 11, 2007 at 11:25:00PM +0100, Daniel P. Berrange wrote:
> On Wed, May 02, 2007 at 07:04:44PM +0100, Richard W.M. Jones wrote:
> > Below is the plan for delivering the remote patch for review in stages. 
> >  More details about in each email.  I only expect to get through the 
> > first two emails today.
> 
> I've been testing this out for real today
> 
>  - IPv6 works correctly
>  - Once I generated the client & server certs the TLS stuff was working
>    pretty much without issue. Though we could do with printing out some
>    clearer stuff in the scenario where user typos on cert/key path names
>    as the current stuff is a littel obscure.
>  - I've been testing with the QEMU driver and hit a few problems with
>    the fact that qemuinternal.c will grab the URIs containing hostnames
>    in the virConnectOpen call. So qemu was getting the connection before
>    remote driver had a chance. Should be simply to path qemu_internal
>    to ignore URIs with a hostname set.
> 
> This last point was in turn killing virt-manager, with a hack workaround
> it seems virt-manager more or less works. Well with obvious exception of
> places where virt-manager uses local state outside the libvirt APIs, but
> that's another story :-)

I've just spent /hours/ trying to work out why 

  c = libvirt.openReadOnly(uri);

  for i in range(100000):
    d = c.listDefinedDomains()

    for dom in d:
        f = c.lookupByName(dom)

Was at least x100 slower with 

    qemu+tcp://localhost/system 

than with

    qemu:///system

Neither the server, nor the client showed *any* cpu time. The box was
completely idle. No sleep states anywhere in the code either. After much
confusion I finally realized that we were being hit by Nagle. Since each
RPC operation requires < 100 bytes to be read & written, every write was
being queued up for some 10's of ms being being sent out. Pointless since
the RPC ops are synchronous we'd never fill a big enough packet to cause
the data to be sent before Nagle timeout. 

When I do

      int no_slow_start = 1;
      setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, (void *)&no_slow_start,
                 sizeof(no_slow_start));

On both the client & server end every socket  then performance using
qemu+tcp://localhost/system was basically identical to qemu:///system
Now if going across the LAN/WAN the delay caused by Nagle will be a
smaller proportion of the RPC call time, due to extra round trip time
on the real network. It is still wasteful to leave it enable though
because its inserting arbitrary delays & due to the sync call-reply
nature of our RPC it'll never get enough data to fill a packet. So
I say disable Nagle all the time.

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|