[libvirt] [PATCHv2 2/2] qemu: increase the timeout before sending SIGKILL to qemu process

Eric Blake eblake at redhat.com
Fri Feb 3 22:16:27 UTC 2012


On 02/03/2012 10:06 AM, Eric Blake wrote:
> On 02/03/2012 01:24 AM, Daniel Veillard wrote:
>> On Thu, Feb 02, 2012 at 12:54:29PM -0500, Laine Stump wrote:
>>> The current default method of terminating the qemu process is to send
>>> a SIGTERM, wait for up to 1.6 seconds for it to cleanly shutdown, then
>>> send a SIGKILL and wait for up to 1.4 seconds more for the process to
>>> terminate. This is problematic because occasionally 1.6 seconds is not
>>> long enough for the qemu process to flush its disk buffers, so the
>>> guest's disk ends up in an inconsistent state.
>>>
>>
>>   On the semantic of the patch, it does what it suggest ACK to this
> 
> Agreed.
> 
>>   ACK at this heuristic attempt but maybe a smarter algorithm is
>> in order, I'm sure others will comment :-)
> 
> I'm in favor of this patch going in now; as you argued, it is a no-op
> change in the common success case, and a reliability fix (even if
> slower) in the case where it would have been giving up too early
> previously, all to benefit applications that haven't yet been adjusted
> to take advantage of the new flags.

Hmm, I've just had a second thought.  Looking back at this thread that
never got applied because Dan had some review comments where I did not
have time to implement them:

https://www.redhat.com/archives/libvir-list/2011-November/msg00243.html

Right now, we guarantee that we will timeout in 3 seconds, but during
those three seconds, we hold the driver lock, which means that no other
application can issue any command on any other VM managed by the same
connection.

If we are going to lengthen the timeout, then we also need to start
thinking about dropping the driver lock for the duration of the wait -
that is, operations on the VM being destroyed will be blocked (except
for parallel attempts to destroy the same domain), but operations on
other domains should not be blocked by the longer timeout.

Even if we don't resolve Dan's concern of subdividing the driver lock
into more manageable pieces, we should at least resolve the problem of
holding the driver lock for the entire virDomainDestroy operation,
before we lengthen the timeouts involved.

-- 
Eric Blake   eblake at redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 620 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20120203/ed8f0332/attachment-0001.sig>


More information about the libvir-list mailing list