[Libguestfs] [PATCH 0/7 v2] Fix and workaround for qcow2 issues in qemu causing data corruption.
Pádraig Brady
P at draigBrady.com
Wed Jul 4 13:38:50 UTC 2012
On 07/03/2012 07:03 PM, Richard W.M. Jones wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=836710
> https://bugzilla.redhat.com/show_bug.cgi?id=836913
>
> There are at least two related bugs going on:
>
> (1) Linux sync(2) system call doesn't send a write barrier to the
> disk, so in effect it doesn't force the hard disk to flush its cache.
> libguestfs used sync(2) to force changes to disk.
Surprising. So sync(2) is currently async. Ho hum.
I just noticed Jan Kara's patch set today actually:
https://lkml.org/lkml/2012/7/3/272
Would fix the issue at the kernel level?
> We didn't expect
> that qemu was caching anything because we used 'cache=none' for all
> writable disks, but it turns out that qemu creates a writeback cache
> anyway when you do this (you need to use 'cache=directsync' when you
> don't want a cache at all).
And we're not using 'directsync' for performance reasons?
> (2) qemu's qcow2 disk cache code is buggy. If there are I/Os in
> flight when qemu shuts down, then qemu segfaults or assert fails.
> This can result in unwritten data. Unfortunately libguestfs ignored
> the result of waitpid(2) so we didn't see this problem happening.
>
> Patch 1/7 fixes the first problem by issuing fsync(2) on each whole
> block device when we sync.
>
> Patches 2/7 - 7/7 are needed to fix the second problem. We add a new
> API (guestfs_shutdown) so that we can actually catch the case where
> qemu is segfaulting instead of just ignoring it. Since qemu itself
> isn't likely to be fixed any time soon, patch 7/7 adds a crude but
> effective workaround to virt-resize.
thanks for looking into this tricky issue so thoroughly,
Pádraig.
More information about the Libguestfs
mailing list