[virt-tools-list] Failure to update CPU usage display

Cole Robinson crobinso at redhat.com
Wed Apr 1 23:14:09 UTC 2015


On 04/01/2015 05:42 PM, Charles Arnold wrote:
> Under certain conditions I see the guest CPU usage display not
> being updated. The problem seems most prevalent when Xen
> is the underlying hypervisor but I have seen it occasionally with
> KVM.
> 
> What appears to be happening is that when a VM is created, the
> engine's _handle_tick_queue() calls the connection tick() method
> which in turn calls _tick().  In here it calls _update_vms() which
> gets a new vmmDomain object.  Then it spawns a new thread to
> call tick_send_signals() where if 'pollvm' is True then 'self._vms'
> will get the newly create vmmDomain object.
> 
> The problem (I think) is that under certain conditions the thread
> spun up by self.idle_add(tick_send_signals) doesn't run quickly
> enough (and therefore doesn't set self._vms) until after _tick() returns
> and is called again and second vmmDomain object is created.
> 
> At this point it appears that the first vmmDomain object collects the
> cpu stats from libvirt while the second vmmDomain object updates
> the display which doesn't have the stats and therefore nothing appears.
> 

That analysis sounds reasonable. Since tick() is running in a thread, if the
mainloop is handling lots of UI stuff or similar it seems possible that tick()
could be invoked twice before the mainloop handles tick_send_signals.

Unfortunately our threading model is pretty hacky here since we have threads
touch a bunch of shared state without any locking, but it's simple and
generally works out fine.

> Below are a couple approaches to solving the problem (assuming
> my analysis is correct). This first one simply yields to give the
> tick_send_signals thread a chance to run.
>

> diff --git a/virtManager/connection.py b/virtManager/connection.py
> index a907a3f..af27141 100644
> --- a/virtManager/connection.py
> +++ b/virtManager/connection.py
> @@ -1240,6 +1240,9 @@ class vmmConnection(vmmGObject):
>                  self._change_state(self._STATE_ACTIVE)
>  
>          self.idle_add(tick_send_signals)
> +        if len(self._vms) < len(vms):
> +            # Allow time for tick_send_signals to run
> +            time.sleep(.1)
>  
>          ticklist = []
>          def add_to_ticklist(l, args=()):
> 

Obviously a timeout is sketchy since it might just make it less likely to trigger.

> This second solution doesn't wait for the thread and sets self._vms
> if pollvm is True.
> 
> diff --git a/virtManager/connection.py b/virtManager/connection.py
> index a907a3f..96c208e 100644
> --- a/virtManager/connection.py
> +++ b/virtManager/connection.py
> @@ -1240,6 +1240,8 @@ class vmmConnection(vmmGObject):
>                  self._change_state(self._STATE_ACTIVE)
>  
>          self.idle_add(tick_send_signals)
> +        if pollvm:
> +            self._vms = vms
>  
>          ticklist = []
>          def add_to_ticklist(l, args=()):
> 
> Any thoughts on this problem and the potential solutions?

We can't update any of the conn lists in the thread since it's possible the
self._vms is in use by the mainloop, which can cause weird errors if it's
updated while another part of the code is iterating over it or similar.

I think the proper thing to do short of some larger refactoring is to use a
threading.Lock(). conn.tick() will grab the lock at the start,
tick_send_signals will release the Lock when it's updated the conn state. So
tick() can't run until the previous tick_send_signals has finished.

If you can reproduce easily, does this patch fix it (and not cause issues?). I
only did light testing. Not pushed yet

Thanks,
Cole
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-connection-Protect-conn-state-tick-updates-with-a-lo.patch
Type: text/x-diff
Size: 5415 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/virt-tools-list/attachments/20150401/e1a12be9/attachment.bin>


More information about the virt-tools-list mailing list