[virt-tools-list] Failure to update CPU usage display

Cole Robinson crobinso at redhat.com
Thu Apr 2 18:06:29 UTC 2015


On 04/02/2015 02:04 PM, Charles Arnold wrote:
>>>> On 4/1/2015 at 05:14 PM, Cole Robinson <crobinso at redhat.com> wrote: 
>> On 04/01/2015 05:42 PM, Charles Arnold wrote:
>>> Under certain conditions I see the guest CPU usage display not
>>> being updated. The problem seems most prevalent when Xen
>>> is the underlying hypervisor but I have seen it occasionally with
>>> KVM.
>>>
>>> What appears to be happening is that when a VM is created, the
>>> engine's _handle_tick_queue() calls the connection tick() method
>>> which in turn calls _tick().  In here it calls _update_vms() which
>>> gets a new vmmDomain object.  Then it spawns a new thread to
>>> call tick_send_signals() where if 'pollvm' is True then 'self._vms'
>>> will get the newly create vmmDomain object.
>>>
>>> The problem (I think) is that under certain conditions the thread
>>> spun up by self.idle_add(tick_send_signals) doesn't run quickly
>>> enough (and therefore doesn't set self._vms) until after _tick() returns
>>> and is called again and second vmmDomain object is created.
>>>
>>> At this point it appears that the first vmmDomain object collects the
>>> cpu stats from libvirt while the second vmmDomain object updates
>>> the display which doesn't have the stats and therefore nothing appears.
>>>
>>
>> That analysis sounds reasonable. Since tick() is running in a thread, if the
>> mainloop is handling lots of UI stuff or similar it seems possible that 
>> tick()
>> could be invoked twice before the mainloop handles tick_send_signals.
>>
>> Unfortunately our threading model is pretty hacky here since we have threads
>> touch a bunch of shared state without any locking, but it's simple and
>> generally works out fine.
>>
>>> Below are a couple approaches to solving the problem (assuming
>>> my analysis is correct). This first one simply yields to give the
>>> tick_send_signals thread a chance to run.
>>>
>>
>>> diff --git a/virtManager/connection.py b/virtManager/connection.py
>>> index a907a3f..af27141 100644
>>> --- a/virtManager/connection.py
>>> +++ b/virtManager/connection.py
>>> @@ -1240,6 +1240,9 @@ class vmmConnection(vmmGObject):
>>>                  self._change_state(self._STATE_ACTIVE)
>>>  
>>>          self.idle_add(tick_send_signals)
>>> +        if len(self._vms) < len(vms):
>>> +            # Allow time for tick_send_signals to run
>>> +            time.sleep(.1)
>>>  
>>>          ticklist = []
>>>          def add_to_ticklist(l, args=()):
>>>
>>
>> Obviously a timeout is sketchy since it might just make it less likely to 
>> trigger.
>>
>>> This second solution doesn't wait for the thread and sets self._vms
>>> if pollvm is True.
>>>
>>> diff --git a/virtManager/connection.py b/virtManager/connection.py
>>> index a907a3f..96c208e 100644
>>> --- a/virtManager/connection.py
>>> +++ b/virtManager/connection.py
>>> @@ -1240,6 +1240,8 @@ class vmmConnection(vmmGObject):
>>>                  self._change_state(self._STATE_ACTIVE)
>>>  
>>>          self.idle_add(tick_send_signals)
>>> +        if pollvm:
>>> +            self._vms = vms
>>>  
>>>          ticklist = []
>>>          def add_to_ticklist(l, args=()):
>>>
>>> Any thoughts on this problem and the potential solutions?
>>
>> We can't update any of the conn lists in the thread since it's possible the
>> self._vms is in use by the mainloop, which can cause weird errors if it's
>> updated while another part of the code is iterating over it or similar.
>>
>> I think the proper thing to do short of some larger refactoring is to use a
>> threading.Lock(). conn.tick() will grab the lock at the start,
>> tick_send_signals will release the Lock when it's updated the conn state. So
>> tick() can't run until the previous tick_send_signals has finished.
>>
>> If you can reproduce easily, does this patch fix it (and not cause issues?). 
>> I
>> only did light testing. Not pushed yet
> 
> Thanks for the patch. I have a machine that reproduces the problem easily. 
> 
> I've tested your patch for both KVM and Xen and it fixes the problem without any
> adverse side effects that I was able to see.  I started about a dozen installs on
> each (same machine booted either KVM or Xen) with a mix of ISO and network
> installation sources for KVM, Xen PV, and Xen HVM.  I would say the patch is
> good.
> 

Thanks for testing, pushed now:

commit ce74cd772647ca8a062c696f296b8afb2bd8efdf
Author: Cole Robinson <crobinso at redhat.com>
Date:   Wed Apr 1 19:10:16 2015 -0400

    connection: Protect conn state tick() updates with a lock

- Cole




More information about the virt-tools-list mailing list