[Libguestfs] [PATCH 0/7 v2] Fix and workaround for qcow2 issues in qemu causing data corruption.

Tue Jul 3 18:03:16 UTC 2012

https://bugzilla.redhat.com/show_bug.cgi?id=836710
https://bugzilla.redhat.com/show_bug.cgi?id=836913

There are at least two related bugs going on:

(1) Linux sync(2) system call doesn't send a write barrier to the
disk, so in effect it doesn't force the hard disk to flush its cache.
libguestfs used sync(2) to force changes to disk.  We didn't expect
that qemu was caching anything because we used 'cache=none' for all
writable disks, but it turns out that qemu creates a writeback cache
anyway when you do this (you need to use 'cache=directsync' when you
don't want a cache at all).

(2) qemu's qcow2 disk cache code is buggy.  If there are I/Os in
flight when qemu shuts down, then qemu segfaults or assert fails.
This can result in unwritten data.  Unfortunately libguestfs ignored
the result of waitpid(2) so we didn't see this problem happening.

Patch 1/7 fixes the first problem by issuing fsync(2) on each whole
block device when we sync.

Patches 2/7 - 7/7 are needed to fix the second problem.  We add a new
API (guestfs_shutdown) so that we can actually catch the case where
qemu is segfaulting instead of just ignoring it.  Since qemu itself
isn't likely to be fixed any time soon, patch 7/7 adds a crude but
effective workaround to virt-resize.

Rich.