[Libguestfs] [PATCH 0/7 v2] Fix and workaround for qcow2 issues in qemu causing data corruption.

Pádraig Brady P at draigBrady.com
Wed Jul 4 13:38:50 UTC 2012


On 07/03/2012 07:03 PM, Richard W.M. Jones wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=836710
> https://bugzilla.redhat.com/show_bug.cgi?id=836913
> 
> There are at least two related bugs going on:
> 
> (1) Linux sync(2) system call doesn't send a write barrier to the
> disk, so in effect it doesn't force the hard disk to flush its cache.
> libguestfs used sync(2) to force changes to disk.

Surprising. So sync(2) is currently async. Ho hum.
I just noticed Jan Kara's patch set today actually:
https://lkml.org/lkml/2012/7/3/272
Would fix the issue at the kernel level?

>  We didn't expect
> that qemu was caching anything because we used 'cache=none' for all
> writable disks, but it turns out that qemu creates a writeback cache
> anyway when you do this (you need to use 'cache=directsync' when you
> don't want a cache at all).

And we're not using 'directsync' for performance reasons?

> (2) qemu's qcow2 disk cache code is buggy.  If there are I/Os in
> flight when qemu shuts down, then qemu segfaults or assert fails.
> This can result in unwritten data.  Unfortunately libguestfs ignored
> the result of waitpid(2) so we didn't see this problem happening.
> 
> Patch 1/7 fixes the first problem by issuing fsync(2) on each whole
> block device when we sync.
> 
> Patches 2/7 - 7/7 are needed to fix the second problem.  We add a new
> API (guestfs_shutdown) so that we can actually catch the case where
> qemu is segfaulting instead of just ignoring it.  Since qemu itself
> isn't likely to be fixed any time soon, patch 7/7 adds a crude but
> effective workaround to virt-resize.

thanks for looking into this tricky issue so thoroughly,
Pádraig.




More information about the Libguestfs mailing list