[Ovirt-devel] Re: [PATCH] Use multiple processes to check host status

Mon Jun 16 07:22:01 UTC 2008

Ian Main wrote:
> On Fri, 13 Jun 2008 14:38:16 -0700
> Ian Main <imain at redhat.com> wrote:
> 
>> This patch causes host-status to fork() up to node_count/5 times to
>> connect out to hosts via libvirt.  This guarantees that that it takes at
>> most 5 timeouts in a row to verify all nodes.  This should help with the
>> bottleneck we were seeing with libvirt connect timeouts.  Testing with 105
>> nodes, almost all of which were down, it took 27s to query all of them.
> 
> Hmm, I got to thinking.. with all of the nodes on that system it was already established that there was 'no route to host' so the timeouts were quick.  A freshly killed node would take longer to timeout.  We could set the process count higher to help eliviate this.  However, I see that it can take up to 10 minutes to timeout a connection under certain circumstances.. 
> 
> It's clear we should move to having the status pushed from the node to the wui, then timeouts will only be a problem for operations in taskomatic etc.  We may want to add a timeout to the libvirt API to deal with this at some point..

Right, exactly.  This was one of the problems I saw in taskomatic; if you do a
simple iptables REJECT rule against the libvirt port, then it can take 10
minutes for the connection to timeout.  We definitely need changes in status
pushed from the node to the WUI; that will both reduce network traffic (from
polling) and also help alleviate this problem.  However, it won't completely
solve it; we will still need to poll for dead hosts from the WUI, and that means
we still need to deal with this possible long timeout problem.

Chris Lalancette