[Libguestfs] [PATCH 0/7 v2] Fix and workaround for qcow2 issues in qemu causing data corruption.

Wed Jul 4 14:17:34 UTC 2012

On Wed, Jul 04, 2012 at 02:38:50PM +0100, Pádraig Brady wrote:
> On 07/03/2012 07:03 PM, Richard W.M. Jones wrote:
> > https://bugzilla.redhat.com/show_bug.cgi?id=836710
> > https://bugzilla.redhat.com/show_bug.cgi?id=836913
> > 
> > There are at least two related bugs going on:
> > 
> > (1) Linux sync(2) system call doesn't send a write barrier to the
> > disk, so in effect it doesn't force the hard disk to flush its cache.
> > libguestfs used sync(2) to force changes to disk.
> 
> Surprising. So sync(2) is currently async. Ho hum.

It's a little more complex than I said above.  sync(2) calls into each
mounted filesystem, and the filesystems should issue flush calls to
the blockdev layer (specifically calling 'blkdev_issue_flush').

But if you bypassed filesystems and wrote, say, to the partition table
or to an unmounted partition directly, then sync as it stands today
won't flush those.

This is in fact the symptom that we see in the test that fails: we
create a partition table, but it never gets written to the qcow2 file.

> I just noticed Jan Kara's patch set today actually:
> https://lkml.org/lkml/2012/7/3/272
> Would fix the issue at the kernel level?

It does look as if the following patch would fix this issue .. yay!
https://lkml.org/lkml/2012/7/3/277

> >  We didn't expect
> > that qemu was caching anything because we used 'cache=none' for all
> > writable disks, but it turns out that qemu creates a writeback cache
> > anyway when you do this (you need to use 'cache=directsync' when you
> > don't want a cache at all).
> 
> And we're not using 'directsync' for performance reasons?

Right.  Although I didn't actually measure it, but was assured that
performance would not be good.

> > (2) qemu's qcow2 disk cache code is buggy.  If there are I/Os in
> > flight when qemu shuts down, then qemu segfaults or assert fails.
> > This can result in unwritten data.  Unfortunately libguestfs ignored
> > the result of waitpid(2) so we didn't see this problem happening.
> > 
> > Patch 1/7 fixes the first problem by issuing fsync(2) on each whole
> > block device when we sync.
> > 
> > Patches 2/7 - 7/7 are needed to fix the second problem.  We add a new
> > API (guestfs_shutdown) so that we can actually catch the case where
> > qemu is segfaulting instead of just ignoring it.  Since qemu itself
> > isn't likely to be fixed any time soon, patch 7/7 adds a crude but
> > effective workaround to virt-resize.
> 
> thanks for looking into this tricky issue so thoroughly,
> Pádraig.

I'm now trying to debug why qemu segfaults and fix that ...

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora